data system design

data system design - php

Need some ideas/help on best way to approach a new data system design. Basically, the way this will work is there will be a bunch of different database/tables that will need to be updated on a regular (daily/weekly/monthly) basis with new records.
The people that will be imputing the data will be proficient in excel. The input process will be done via a simple upload form. Then the system needs to add what was imported to the existing data in the databases. There needs to be a "rollback" process that'll reset the database to any day within the last week.
There will be approximatively 30 to 50 different data sources. the main primary interface will be an online search area area. so all of the records need to be indexed/searchable.
Ideas/thoughts on how to best approach this? It needs to be built mostly out of php/mysql.

imputing the data
Typo?
What you are asking takes people with several years formal training to do. Conventionally, the approach would be to draw up a set of requirements, then a set of formal specifications, then the architecture of the system would be designed, then the data design, then the code implementation. There are other approaches which tend to shortcut this. However even in the case of a single table (although it does not necessarily follow that one "simple upload form" corresponds to one table), with a single developer there's a couple of days work before any part of the design could be finalised, the majority of which is finding out what the system is supposed to do. But you've given no indication of the usage nor data complexity of the system.
Also what do you mean by upload? That implies they'll be manipulating the data elsewhere and uploading files rather than inputting values directly.
You can't adequately describe the functionality of a complete system in a 9 line SO post.
You're unlikely to find people here to do your work for free.
You're not going to get the information you're asking for in a S.O. answer.
You seem to be struggling to use the right language to describe the facts you know.
Your question is very vague.

Related

How bad it is to copy paste my own code blocks?

! Actually, I am learning PHP from last couple of months and now I am in a stage where I can program small things like a simple Login Page in PHP and mySQL or a Contact Form. I have wrote a lot of codeblocks like inserting something into a database or selecting something from a database etc. etc. But, I always copy paste my own code-blocks from previous projects while working on a new one. So, I want to know whether this tendency is unique to me only or each of the beginner passes through the same phase once during their journey of being a developer?
Please bear with me because I know this isn't really a programming question and doesn't worth your time as well. I tried finding out in Google as well but this is a snap of what I found:
I mean to say that most of the search results dealt with copy pasting other's code which is not the case of what I am talking about. In order to save time I do copy paste my own code blocks almost everytime. So, how bad is this behaviour of mine?
I again apologize for not posting a question that is worth your time but I am finding it hard to learn to code by myself without having any mentor nearby ( Actually, I searched for a mentor who could teach PHP before giving it a start all by myself, but I found none in my area ) for clearing my doubts and as such Internet is the thing which I mostly depend upon for learning about anything.

This question probably belongs on https://softwareengineering.stackexchange.com but I'll try to give you a decent answer and some guidance.
People re-use their own code all the time. You do not however want to copy/paste if possible. The issue with copy/paste is when you have something used more than a few times - like a MySQL database connection - and it needs updating. I'd rather modify one file (or one small group of files) and have all of my webapps fixed/updated than having to modify 2 or 3 database calls in 9 different web apps...
For things that I use everywhere/all the time - talking with our course management systems API, authenticating a user against our LDAP server, connecting to a MySQL database and running queries, processing forms that are emailed, etc - I've built up my own (or coworkers have) sets of functions, classes, etc. Which I then keep in a single directory, and can include as needed.
If you do this, you want your functions/object methods to be as generic as possible - for example, my MySQL query function takes several arguments - an associative array with connection info (since we have several DB servers based on purpose), a query, and an array of parameters. It returns an array with a status code, and then appropriate data - the record set result for inserts, the ID of the last insert, the count of rows affected (for delete/update). This one function handles 50+ queries and connects to 4 different MySQL servers.

MySQL: HOW TO MOVE OLD DATA TO DB [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I finished creating an accounting web application for an organization using codeignter and mysql db, and I have just submitted it to them, they liked the work, but they asked me how they would transfer their old manual data to the new one online, so that their members would be able to see their account balances and contributions history.
This is a major problem for me because most of my tables make use of 'referential integrity' to ensure data synchronization and would not support the style of their manual accounting.
I know a lot of people here have faced cases like this and I would love to know the best way to collect users history, and I also know this might probably be flagged as not a real question, but I really really have to ask people with experience.
I would appreciate all answers. Thanks (And vote downs too)..

No matter what the case is, data conversions are very challenging and almost always time consuming. Depending on the consistency of the data in question, it could be a case that about 80% of the data will transfer over neatly if you create a conversion program using PHP. That conversion code in and of itself may be more time consuming than it is worth. If you are talking hundreds of thousands of records and beyond, it is probably a good idea to make that conversion program work. Anyone who might suggest there is a silver bullet is certainly not correct.
Here are a couple of suggested steps:
(Optional) Export your Excel spreadsheets to Access. Access can help you to standardize data and has tools in place to help you locate records which have failed in some way. You can also create filters in Access if you need to. The benefit of taking this step, if you are familiar with Access, is that you have already begun the conversion process to a database. As a matter of fact, if you so desire, you can import your MySQL database information into Access as well. The benefit of this is pretty obvious: You can create a query and merge your two separate tables together to form one table, which could save you a great deal of coding.
Export your Access table/query into a CSV file (note, if you find it is overkill or if you don't have Access, you can skip step 1 and simply save your .xls or .xlsx file to type .csv. This may require more legwork for your PHP conversion code but that is probably a matter of preference. Some people prefer to avoid Access as much as possible, and if you don't normally use it you will be wasting time trying to learn it just to save yourself a little bit of time).
Utilize PHP's built-in str_getcsv function. This will convert a CSV file into a PHP array.
Create your automated program to parse through each record. Based on the column and its requirements, you can either accept or reject records. You can then export your data, such as was done in this SO answer, back to CSV. You can save two different CSV files, one with accepted records, and one with rejected records.
With rejected records, which are all but inevitable when transferring from a spreadsheet, you will need to have a course of action. The simplest way for your clients is probably to give them a procedure to either manually import records into the database, if you've given them an interface to do so, or - probably simpler but requiring more back-and-forth - to update the records in Excel to be compliant with the new system.
Edit
Based on the thread under your question which sheds more light on what you are trying to do (i.e., have a parent for each transaction that is an accepted loan), you should be able to contrive a parent field, even if it is not complete, by creating a parent record for each set of transactions based around an account. You can do this via Access, PHP, or, more likely, a combination.
Conclusion
Long story short, data conversions take time. If you put the time in up front, it will be far easier to maintain a standardized series of information in the long run. If you find something which takes less time in the beginning, it will mean additional work for you in the long run in order to make this "simple" fix work over time.
Similarly, the closer you can get legacy data to conform to your new data, the easier it will be for your clients to perform queries etc. While this may mean that some manual entry will be required on the part of you or your client, it is better to inform the client of the pros and cons of each method fully and let them decide. My recommendation would always be to put extra work in at the front-end because it almost always ends up cheaper than having to deal with a quick fix in the long run, but that is not always practical given real world constraints.

Writing a mySQL database from the ground up [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I recently got an internship involving the writing of an sql database and don't really know where to start.
The database will hold the information of a ton of electrical products: including but not limited to watt usage, brand, year, color, size, country of origin. The database's main task is to use some formulas using the above information to output several energy-related things. I will also be building a simple gui for it. The database will be accessed solely on Windows computers by no more than 100 people, possibly around the area of 20.
The organization does not normally give internships, and especially not to programmers (they're all basically electrical engineers); I got it via really insisting to a relative that works there who was looking into how to organize some of the products they overlook. In other words, I can't really ask THEM for guidance on the matter since they're not programmers, which is the reason I headed here to get a feel of where I'm starting.
I digress -- my main concerns are:
What is some recommended reading or viewing for this? Tips, tricks?
How do I set up a server for this? What hardware/bandwidth/etc. do I require?
How do I set up the Database?
For the gui client, I decided to take a look at having it be a window showing a webpage built with the sql embeded into php. Is this a good idea? Recommended reading for doing that? What are alternatives?
What security measures would you recommend? Recommended reading?
I have: several versions of Microsoft's mySQL servers, classroom experience with mySQl and PHP, several versions of Visual Studio, access to old PCs for testing (up to and including switching operating systems, hardware, etc.), access to a fairly powerful PC (non-modifiable), unlimited bandwidth.
Any help would be appreciated, thanks guys!

What is some recommended reading or viewing for this? Tips, tricks?
I'd recommend spending quite a bit of time in the design stage, before you even touch a computer. Just grab some scrap paper and a pencil and start sketching out various "screens" that your UI might expose at various stages (from menus to inputs and outputs); show them to your target users and see if your understanding of the application fits with the functionality they expect/require; consider when, where, how and why they will access and use the application; refine your design.
You will then have a (vague) functional specification, from which you will be able to answer some of the further questions posed below so that you can start researching and identifying the technical specification: at this stage you may settle upon a particular architecture (web-based?), or certain tools and technologies (PHP and MySQL?). You can then identify further resources (tutorials?) to help progress toward implementation.
How do I set up a server for this? What hardware/bandwidth/etc. do I require?
Other than the number of users, your post gives very little indication of likely server load from which this question can be answered.
How much data will the database store ("a ton of electrical products" is pretty vague)? What operations will need to be performed ("use some formulas ... to output several energy-related things"
is pretty vague)? What different classes of user will there be and what activities will they perform? How often will those activities write data to and read data from the database (e.g. write 10KiB, once a month; and read 10GiB, thousands of times per second)? Whilst you anticipate 20 users, will they all be active simultaneously, or will there typically only be one or two at any given time? How critical is the application (in terms of the reliability/performance required)?
Perhaps, for now, just install MySQL and see how you fare?
How do I set up the Database?
As in, how should you design the schema? This will depend upon the operations that you intend to perform. However, a good starting point might be a table of products:
CREATE TABLE products (
product_id SERIAL,
power INT UNSIGNED COMMENT 'watt usage',
brand VARCHAR(255),
year INT UNSIGNED,
color VARCHAR(15),
size INT UNSIGNED,
origin CHAR(2) COMMENT 'ISO 3166-1 country code'
);
Depending upon your requirements, you may then wish to create further tables and establish relationships between them.
For the gui client, I decided to take a look at having it be a window showing a webpage built with the sql embeded into php. Is this a good idea? Recommended reading for doing that? What are alternatives?
A web-based PHP application is certainly one option, for which you will find a ton of helpful free resources (tutorials, tools, libraries, frameworks, etc.) online. It also is highly portable (as virtually every device has a browser which will be able to interact with your application, albeit that ensuring smooth cross-browser compatibility and good cross-device user experience can be a bit painful).
There are countless alternatives, using virtually any/every combination of languages, runtime environments and architectures that you could care to mention: from Java servlets to native Windows applications, from iOS apps to everything in between, the choice is limitless. However, the best advice is probably to stick to that with which you are already most comfortable/familiar (provided that it can meet the functional requirements of the application).
What security measures would you recommend? Recommended reading?
This is another pretty open-ended question. If you are developing a web-app, I'd at very least make yourself aware of (how to defend against) SQL injection, XSS attacks and techniques for managing user account security. Each of these areas alone are quite large topics—and that's before one even begins to consider the security of the hosting platform or physical environment.

Logistics and transportation planning techniques [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I'm actually developping a system for bus ticket reservation. The provider has many routes and different trips. I've setup a rather comprehensive database that maps all this together, but i'm having trouble getting the pathing algorithm working when i comes to cross route reservation.
For example, the user wants to go from Montreal to Sherbrooke, he'll only take what we call here Route #47. But in the event he goes to Sutton instead of Sherbrooke, he now has to transfer into route #53 at some point.
Now, it isn't too hard to detect one and only one transfer. But when i comes to detecting what are the options he can do to cross multiple routes, i'm kinda scared. I've devised a cute and relatively efficient way to do so on 1-3 hops using only SQL but i'm wondering how i should organize all this in a much broader spectrum as the client will probably not stay with 2 routes for the rest of is life.
Example of what i've thought of so far:
StartingStop
joins to Route
joins to StopsOfTheRoute
joins to TransfersOnThatStop
joins to TargetStopOfThatTransfer
joins to RouteOfThatStop
joins to StopsOfThatNewRoute
[wash rince repeat for more hops]
where StopsOFThatNewRoute = EndingStop
Problem is, if i have more than 3 hops, i'm sure my SQL server will choke rather fast under the pressure, even if i correctly index my database, i can easily predict a major failure eventually...
Thank you

My understanding of your problem: You are looking for an algorithm that will help you identify a suitable path (covering segments of one or more routes).
This is as Jason correctly points out, a pathfinding problem. For this kind of problem, you should probably start by having a look at Wikipedia's article on pathfinding, and then dig into the details of Djikstra's algorithm. That will probably get you started.
You will however most likely soon realise that your data model might pose a problem, both from a structure and performance perspective. Typical example would be if you need to manage time constraints: which path has the shortest travel time, assuming you find several? One path might be shortest, but only provide one ride per day, while another path might be longer but provide several rides per day.
A possible way of handling this is to create a graph where each node corresponds to a particular stop, at a particular time. An edge would connect from this stop in room-time to both other geographical stops, as well as itself at the next point in time.
My suggestion would be to start by reading up on the pathfinding algorithms, then revisit your data model with regards to any constraints you might have. Then, focus on the structure for storing the data, and the structure for seaching for paths.
Suggestion (not efficient, but could work if you have a sufficient amount of RAM to spare). Use the relational database server for storing the basics: stops, which routes are connected to which stops and so on. Seems you have this covered already. Then build an in-memory representation of a graph given the constraints that you have. You could probably build your own library for this pretty fast (I am not aware of any such libraries for PHP).
Another alternative could be to use a graph database such as Neo4j and its REST interface. I guess this will require significant some redesign of your application.
Hope this gives you some helpful pointers.

Learning about MySql for large database/tables?

I've been working on a new site of mine for a couple of days now which will be retrieving almost all of its most used content from a MySql database. Seeming as the Database and website is still under development the tables are really small at the moment and speed is of no concern yet.
But you know what they say, a little bit of hard work now saves you a headache later on.
Now I'm only 17, the only database I've ever been taught was through Microsoft Access, and we were practically given the database completed - we learned up to 3NF, but that was about it.
I remember reading once when I was looking to pull data (randomly) out of a database how large databases were taking several seconds/minutes to complete a single query, so this just got me thinking. In a fraction of a second I can submit a search to google, google processes the query and returns the result, and then my browser renders it - all done in the blink of an eye. And google has billions of records to search through. And they're also doing this for millions of users simultaneously.
I'm thinking, how do they do it? I know that they have huge data centers, but still.
I realize that it probably comes down to the design of the database, how it's been optimized, and obviously the configuration. And I guess that's my question really. Could someone please tell me how to design high performance databases for millions/billions of rows (yes, I'm being optimistic), and possibly point me towards some good reading material to help me learn further?
Also, all my queries are done via PHP, if that's at all relevant to any answers.

The blog http://highscalability.com/ has some good articles and pointers to how companies handle large problems.
Specifically related to MySQL, you can Google for craigslist.org's use of MySQL.
http://www.slideshare.net/jzawodn/mysql-and-search-at-craigslist

First the good news... MySQL scales well (depending on the hardware) to at least hundreds of millions of rows.
Once you get to a certain point, a single database server will have trouble managing the load. That's when you get into the realm of partitioning or sharding... spreading the load across multiple database servers using any one of a number of different schemes (e.g. putting unrelated tables on different servers, spreading a single table across multiple servers e.g. by using the ID or date range as a partitioning key).
SQL does shard, but is not fundamentally designed to shard well. There's a whole category of storage alternatives collectively referred to as NoSQL that are designed to solve that very problem (MongoDB, Cassandra, HBase are a few).
When you use SQL at very large scale, you run into any number of issues such as making data model changes across a DB server farm, trouble keeping up with data backups, etc. That's a very complex topic, and people that solve it well are rare. For a glimpse at the issues, have a look at http://gigaom.com/cloud/facebook-trapped-in-mysql-fate-worse-than-death/
When selecting a database platform for a specific project, benchmark the solution early and often to understand whether or not it will meet the performance requirements that you envision. Having a framework to do that will help you learn about scalability, and will help you decide whether to invest effort in improving the data storage part of your solution, and will help you know where best to invest your time.

No one can tell how to design databases. It comes after much reading and many hour working on them. A good design is product of many many years doing them though. As you've only seen Access you got no knowledge of databases. Search through Amazon.com and you'll get tons of titles. For someone that's starting, anyone will do it.
I mean no disrespect. I've been there and I'm also tutor of some people learning programming/database design. I do know that there's no silver bullet or shortcuts for the work you have ahead.
If you intend to work with high performance database, you should have something in mind. The design of them in per application. A good design depends on learning more and more how the app's users interact with the system, the usage patterns, etc. The things you'll learn from books will give you options, using them will depend heavily on the scenario.
Good luck!

It doesn't all come down to the design of the database, though that is indeed a big part of it. The guys who made Google are geniouses, and if I'm not completely wrong about Google you won't be able to find out exactly how they do what they do. Also, I know that years back they had more than 10,000 computers processing queries, and today they probably have many more. I also suspect them for caching most of the recent/popular keywords. And all the websites have been indexed and analyzed using an unknown algorithm which will make sure the computers don't have to look through all the words on every page.
In fact, Google crawls the entire internet around every 14 days, so when you do a search you do not search the entire internet. Your search gets broken down into keywords and then these keywords are used to narrow the number of relevant pages - and I'm pretty sure all pages have already been analyzed for important and/or relevant keywords before you even thought of visiting google.com.
Have a look at this question.

Have a look into Sphinx server.
http://sphinxsearch.com/
Craigslist uses that for their search engine. Basically, you give it a source and it indexes whatever you want (mysql database/table, text files, etc.). If it works for craigslist, it should work for you.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.