A db with several hundred tables will usually have more then few enums/sets defined in it.
How do you synchronize you PHP code with the enum values, to avoid using string literals in your code (which we all know it just seems easier, but in the long run causes stupid bugs).
The only good solution I can think of is using, during the build of the project, a process which will generate constants for all the enums in the DB, or, create and compile a PHP extension with those values.
Any different, better solutions out there?
As Plato would say: either PHP and SQL are independent, or SQL is generated from PHP, or PHP is generated from SQL, or both are generated from the same source.
The first is usually unacceptable if there's a lot of changes happening, but otherwise quite stable.
The second is what happens with a lof of recent solutions that manage your SQL from the PHP side (and gracefully handle schema changes from the PHP side as well). This means your mad SQL ninja skills might be useless and frustrating.
The third is what you suggest. It's fine as long as you can manage the "generated during build" part without stepping on the toes of your source control system and your package system. Having a clean build system helps (which is somewhat unusual in PHP).
The fourth involves adapting an existing database modeling tool to generate both the SQL schema and the PHP code. This way, you would simply work with the tool once every two weeks, generate both PHP and SQL, and commit the result. Adapting the tool, however, can be quite nasty (then again, you can try to look for an XML export and work from there).
Related
Here is my problem:
I have many known locations (I have no influence to these) with a lot of data. Each locations offers me in individual periods of a lot new data. Some give me differential updates, some just the whole dataset, some via xml, for some I have to build a webscraper, some need authentication etc...
These collected data should be stored in a database. I have to program an api to send requested data in xml back.
Many roads lead to Rome but which should i choose?
Which software would you suggest me to use?
I am familiar with C++,C#,Java,PHP,MySQL,JS but new stuff is still ok.
My idea is to use cron jobs + php (or shell script) + curl to fetch the data.
Then I need a module to parse and insert the data into a database (mysql).
The data requests from clients could answer a php script.
I think the input data volume is about 1-5GB/day.
The one correct answer doesn't exist, but can you give me some advice?
It would be great if you can show me smarter ways to do this.
Thank you very much :-)
LAMP: Stick to PHP and MySQL (and make occasional forays into perl/python): availability of PHP libraries, storage solutions, scalability and API solutions and its community size well makes up for any other environment offerings.
API: Ensure that the designed API queries (and storage/database) can meet all end-product needs before you get to writing any importers. Date ranges, tagging, special cases.
PERFORMANCE: If you need lightning fast queries for insanely large data sets, sphinx-search can help. It's got more than just text search (tags, binary, etc) but make sure you spec the server requirements with more RAM.
IMPORTER: Make it modular: as in, for each different data source, write a pluggable importer that can be enabled/disabled by admin, and of course, individually tested. Pick a language and library based on what's best and easiest fit for the job: bash script is okay.
In terms of parsing libraries for PHP, there are many. One of recent popular ones is simplehtmldom and I found it to work quite well.
TRANSFORMER: Make data transformation routines modular as well so it can be written as a need arises. Don't make the importer alter original data, just make it the quickest way into an indexed database. Transformation routines (or later plugins) should be combined with API query for whatever end result.
TIMING: There is nothing wrong with cron executions, as long as they don't become runaway or cause your input sources to start throttling or blocking you so you need that awareness.
VERSIONING: Design the database, imports, etc to where errant data can be rolled back easily by an admin.
Vendor Solution: Check out scraperwiki - they've made a business out of scraping tools and data storage.
Hope this helps. Out of curiosity, any project details to volunteer? A colleague of mine is interested in exchanging notes.
Recently I've begun working on exploring ways to convert about 16k Corel Paradox 4.0 database tables (my client has been using a legacy platform over 20 years mainly due to massive logistical matters) to more modern formats (i.e.CSV, SQL, etc.) en mass and so far I've been looking at PHP since it has a library devoted to Paradox data processing however while I'm fairly confident in how to write the conversion code (i.e. simply calling a few file open, close, and write functions) I'm concerned about error detection and ensuring that when running the script, I don't spend hours waiting for it to run only to see 16k corrupt files exported.
Also, I'm not fully sure about the logic loop for calling the files. I'm thinking of having the program generate a list of all the files with the appropriate extension and then looping through the list, however I'm not sure if that's ideal for a directory of this size.
This is being run on a local Windows 7 x64 system with XAMPP setup (the database is all internal use) so I'm not sure if pure PHP is the best idea -- so I've been wondering if Python or some other lightweight scripting language might be better for handling this.
Thanks very much in advance for any insights and assistance,
This is doubtless far too late to help you, but for posterity...
If one has a Corel Paradox working environment, one can just use it to ease the transition.
We moved the Corel Paradox 9 tables we had into an Oracle schema we built by connecting to the schema (using an alias such as SCHEMA001) then writing this Procedure in a script from inside Paradox:
Proc writeTable(targetTable String)
errorTrapOnWarnings(Yes)
try
tc.open(targetTable)
tc.copy(":SCHEMA001:" + targetTable)
tc.close()
onFail
errorShow()
endTry
endProc
One could highly refine this with more Paradox programming, but you get the idea. One thing we discovered, though, is that Paradox uses double quotes for the column names when it creates the Oracle version, which means you can get lower-case letters in column names in Oracle, which is a pain. We corrected that by writing a quick Oracle query to upper() all the resulting column names.
We called the procedure like so:
Var
targetTable String
tc TCursor
endVar
method run(var eventInfo Event)
targetTable = "SomeTableName"
writeTable(targetTable)
msgInfo("TransferData.ssl--script finished",
"That's all, folks!" )
return
endMethod
If you intend to just convert the data which I guess is a process you do only once you will run the script locally as a command script. For that you don't need a web site and thus XAMPP. What language you take is secondary except you say that PHP has a library. Does python or others have one?
About your concern of error detection why not test your script with only one file first. If that conversion is successful you can build your loop and test this on maybe five files, i.e. have a counter that ends the process after that number. It that is still okay you can go on with the rest. You can also write log data and dump a result for every 100 files processed. This way you can see if your script is doing something or idling.
I'm planning a PHP website architecture. It will be a small website with few visitors and small set of data. The data is modified exclusively by a single user (administrator).
To make things easier, I don't want to bother with a real database or XML data. I think about storing all data through PHP serialization into several files. So for example if there are several categories, I will store an array containing Category class instances for each category.
Are there any pitfalls using PHP serialization in those circumstances?
Use databases -- it is not that difficult and any extra time spent will be well learnt with database use.
The pitfalls I see are as Yehonatan mentioned:
1. Maintenance and adding functionality.
2. No easy way to query or look at data.
3. Very insecure -- take a look at "hackthissite.org". A lot of the beginning examples have to do with hacking where someone put the data hard coded in files.
4. Serialization will work for one array, meaning one table. If you have to do anything like have parent categories that have to match up to other data, not going to work so well.
The pitfalls come when with maintenance and adding functionality.
it is a very good way to learn but you will appreciate databases more after the lessons.
I tried to implement PHP serialization to store website data. For those who want to do the same thing, here's a feedback from the project started a few months ago and heavily modified since:
Pros:
It was very easy to load and save data. I don't have to write SQL queries, optimize them, etc. The code is shorter (with parametrized SQL queries, it may grow a lot).
The deployment does not require additional effort. We don't care about what is supported on the web server: if there is just PHP with no additional extensions, database servers, etc., the website will still work. Sqlite is a good thing, but it is not possible to install it on some servers, and it also requires a PHP extension.
We don't have to care about updating a database server, nor about the database server to use (thus avoiding the scenario where the customer wants to migrate from Microsoft SQL Server to Oracle, etc.).
We can add more properties to the objects without having to break everything (just like we can add other columns to the database).
Cons:
Like Kerry said in his answer, there is "no easy way to query or look at data". It means that any business intelligence/statistics cases are impossible or require a huge amount of work. By the way, some basic scenarios become extremely complicated. Let's say we store products and we want to know how much products there are. Instead of just writing select count(1) from Products, in my case it requires to create a PHP file just for that, load all data then count the number of items, sometimes by adding stuff manually.
Some changes required to implement data migration, which was painful and required more work than just executing an SQL query.
To conclude, I would recommend using PHP serialization for storing data of a small website modified by a single person only if all the following conditions are true:
The deployment context is unknown and there are chances to have a server which supports only basic PHP with no extensions,
Nobody cares about business intelligence or similar usages of the information,
There will be no changes to the requirements with large impact on the data structure.
I would say use a small database like sqlite if you don't want to go through setting up a full db server. However I will also say that serializing an array and storing that in a text file is pretty dang fast. I've had to serialize an array with a few thousand records (a dump from a database) and used that as a temp database when our DB server was being rebuilt for a few days.
On a site with a reasonable amount of traffic , would it matter if the application/business logic is written as stored procedures ,triggers and views , instead of inside the PHP code itself?
What would be the best way to go keeping scalability in mind.
I can't provide you statistics, but unless you plan to change PHP for another language in the future, i can say keeping the business logic in PHP is more "scalability friendly".
Its always easier and cheaper to solve web server load problems than having them in the database. Your database will always need to be lighting quick and just throwing mirrors at it won't solve the problem. The more database slaves you have, the more writes you have to do.
In my experience, you should put business logic in PHP code rather than move it onto the database. Assuming your database is on a separate server, you don't want your database to be busy calculating formulas when requests come in.
Keep your database lightning fast to handle selects, inserts and updates.
I think you will have far better scalibility keeping database code in the database where it can be performance tuned as the number of records gets larger. You will also have better data integrity which is critical to the data even being useful. You don't see a lot of terrabyte sized relational dbs with all their code in the application.
Read some books on database performance tuning and then decide if you want to risk your company's data on application code.
There are several things to consider when trying to decide whether to place the business logic in the database or in the application code.
Will the same database be accessed
from different websites / web
applications? Will the sites /
applications be written in the same
language or in a different language?
If the database will be used from a single site, and the site is written in a single language then this becomes a non-issue. Otherwise, you'll need to consider the added complexity of stored procedures, triggers, etc vs trying to maintain database access logic etc in multiple code bases.
What are relational databases in
general good for and what is MySQL
good for specifically? What is PHP
best at?
This consideration is fairly straight-forward. Relational databases across the board and specifically in any variant of SQL are going to do a great job at inserting, updating, and deleting data. Generally they also handle ATOMIC transactions well. However, most variants of SQL (including MySQL) are not good at complex calculations, on-the-fly date handling, file system access etc.
PHP on the other hand is very fast at handling calculations, dates, file system accesses. By taking a little time you can even design your PHP code to work in such a way that records are only retrieved once and then stored when necessary.
What are you most familiar /
comfortable with using?
Obviously it tends to make more sense to use the tool with which you are most familiar.
As a last point consider that just because a drill can be used to cut sheet rock or because a hammer can be used to drive a screw doesn't mean that they should be used for these things. Sometimes I think that programmers do more potential damage by trying to make more powerful tools that do everything rather than making simpler tools that do one thing really, really well.
A well done PHP application should be enought, but keep in mind that it also requires you to do the less calls to the database you can. Store values you'll need later in PHP, shorten queries, cache, etc.
MySQL optimization is always a must, as it will also decrease the amount of databse calls by PHP, and thus getting a better performance. Therefore, there's no way you can't think of stored procedures, etc, if your aim is to increase performance. But MySQL by itself would't be enought if your PHP code isn't well done (lots of unecessary database calls), that's why I think PHP must be well coded, keeping in mind the hole process while developing it, so that unecessary stuff doesn't get in the way. Cache for instance, in "duet" with proper MySQL, is a great boost on performance.
My POV, even not having much experience in developing large applications is to write business logic in the DB for some reasons:
1 - Maintainability, I think that languages deprecate functions and changes many other things in a short time period, so if PHP changes version, you'll need to adapt your code to the new version
2 - DBs tends to be more language stable, so when a new version of a RDBMS comes out, it usually doesn't change many things in the way you write your queries or SPs, or it even doesn't change. Writing your logic in DB will reduce code adaptation because of a new DB version
3 - A RDBMS is more likely to be alive for a long period rather than a programming language. Also, as your data is critical, there is a big worry from the RDBMS developers for automatic migration of your whole data to the new RDBMS version, including your SPs. When clipper died, there were no ways to migrate systems to a new programming language, they had to be completely rewritten.
4 - If you think someday to change completely the language you are writing the application for some reason(language death, for example), the only thing to be rewritten will be the presentation and the SP calls, not business logic.
I'd like to know from other people here if what I pointed out makes sense, and if not, why. I'm on the same situation as Sabeen Malik, I'm thinking to begin my first huge project and I'm tending towards SPs because of what I wrote. So it's time to correct my POV if it's not so correct.
MySQL sucks at using advanced DB techniques, it's simple and fast. PHP, being a dynamic language, makes processing data very easy. Therefore, it usually makes sense to use PHP.
I'm creating and app that will rely on a database, and I have all intention on using a flat file db, is there any serious reasons to stay away from this?
I'm using mimesis (http://mimesis.110mb.com)
it's simpler than using mySQL, which I have to admit I have little experience with.
I'm wondering about the security of the db. but the files are stored as php and it seems to be a solid database solution.
I really like the ease of backing up and transporting the databases, which I have found harder with mySQL. I see that everyone seems to prefer the mySQL way - and it likely is faster when it comes to queries but other than that is there any reason to stay away from flat-file dbs and (finally) properly learn mysql ?
edit
Just to let people know,
I ended up going with mySQL, and am using the CodeIgniter framework. Still like the flat file db, but have now realized that it's way more complex for this project than necessary.
Use SQLite, you get a database with many SQL features and yet it's only a single file.
Greetings, I'm the creator of Mimesis. Relational databases and SQL are important in situations where you have massive amounts of data that needs to be handled. Are flat files superior to relation databases? Well, you could ask Google, as their entire archiving system works with flat files, and its the most popular search engine on Earth. Does Mimesis compare to their system? Likely not.
Mimesis was created to solve a particular niche problem. I only use free websites for my online endeavors. Plenty of free sites offer the ability to use PHP. However, they don't provide free SQL database access. Therefore, I needed to create a database that would store data, implement locking, and work around file permissions. These were the primary design parameters of Mimesis, and it succeeds on all of those.
If you need an idea of Mimesis's speed, if you navigate to the first page it will tell you what country you're viewing the site from. This free database is taken from the site ip2nation.com and ported into a Mimesis ffdb. It has hundreds if not thousands of entries.
Furthermore, the hit counter on the main page has already tracked over 7000 visitors. These are UNIQUE visits, which means that the script has to search the database to see if the IP address that's visiting already exists, and also performs a count of the total IPs.
If you've noticed the main page loads up pretty quickly and it has two fairly intensive Mimesis database scripts running on the backend. The way Mimesis stores data is done to speed up read and write procedures and also translation procedures. Most ffdb example scripts or other ffdb scripts out there use a simple CVS file or other some such structure for storing data. Mimesis actually interprets binary data at some levels to augment its functionality. Mimesis is somewhat of a hybrid between a flat file database and a relational database.
Most other ffdb scripts involve rewriting the COMPLETE file every time an update is made. Mimesis does not do this, it rewrites only the structural file and updates the actual row contents. So that even if an error does occur you only lose new data that's added, not any of the older data. Mimesis also maintains its history. Unless the table is refreshed the data that rows had previously is still contained within.
I could keep going on about all the features, but this isn't intended as a "Mimesis is the greatest database ever" rant. Moreso, its intended to open people's eyes to the fact that SQL isn't the ONLY technology available, and that flat files, when given proper development paradigms are superior to a relational database, taking into account they are more specialized.
Long live flat files and the coders who brave the headaches that follow.
The answer is "Fine" if you only NEED a flat-file structure. One test: Would a single simple spreadsheet handle all needs? If not, you need a relational structure, not a flat file.
If you're not sure, perhaps you can start flat-file. SQLite is a great app for getting started.
It's not good to learn you made the wrong choice, if you figure it out too far along in the process. But if you understand the importance of a relational structure, and upsize early on if needed, then you are fine.
I really like the ease of backing up
and transporting the databases, which
I have found harder with mySQL.
Use SQLite as mentioned in another answer. There is only one file to backup, or set up periodic dumps of the MySQL databases to SQL files. This is a relatively simple thing to do.
I see that everyone seems to prefer
the mySQL way - and it likely is
faster when it comes to queries
Speed is definitely a consideration. Databases tend to be a lot faster, because the data is organized better.
other than that is there any reason to
stay away from flat-file dbs and
(finally) properly learn mysql ?
There sure are plenty of reasons to use a database solution, but there are arguments to be made for flat files. It is always good to learn things other than what you "usually" use.
Most decisions depend on the application. How many concurrent users are you going to have? Do you need transaction support?
Wanted to inform that Mimesis has moved from the original URL to http://mimesis.site11.com/
Furthermore, I am shifting the focus of Mimesis from an ffdb to a key-value store. It's more sensible Given the types of information I'm storing and the methods I use to retrieve it. There was also a grave error present in the coding of Mimesis (which I've since fixed). However, I'm still in the testing phase of the new key-value store type. I've also been side-tracked by other things. Locking has also been changed from the use of file creation to directory creation as the mutex mechanism.
Interoperability. MySQL can be interfaced by basically any language that counts. Mimesis is unlikely to be usable outside PHP.
This becomes significant the moment you try to use profilers, or modify data from the outside.
You might also look at http://lukeplant.me.uk/resources/flatfile/ for the PHP Flatfile Package.
The issue with going flatfile is that in order to adjust the situation for further development you have to alter a significant amount of code in order to improve the foundation of the system. Whereas if it was a pure SQL system it would require little to no modification to proceed in the future.