recently i've used Maxmind geoip to locate country & city based on the ip. It has huge content inside the dat files. but retrieving of those records happens within a seconds. so i'm so curious to learn and use the technology in php.
First i've seen some video files are using this .dat extension files and now text information. so what is .dat extension actually? is it possible to read and write in php.
Thanks!
For what I know, dat extension means a generic file in which you could write what you need, in the format you please.
I mean, in every file you could do that, but generally if you find an xml file you assume that inside you find xml formatted text; on the contrary dat files are not recognized as something you can decode with a specific software if you don't know who and how wrote it.
The files will most likely be in a custom format that they developed; if it's open source you could reimplement it in PHP (if it isn't already written in PHP), or maybe access the data through an API.
The speed will come from the fact that it'll be indexed in some way, or it's like "for every record move 100 bytes further into the file".
There's a lot of questions here.
First, the file is a database - it stores data. There are lots of database models - relational, herarchical, object-oriented, vector, hypercube, keystore....there are implementations of all these available off the shelf.
Some databases are more apposite to managing particular data structures than others. Geospatial data is a common specialization - so much so that a lot of other database types will provide vector functionality (e.g. mysql and postgresql which are relational databases).
For most database systems, the application using the services of the database does not access the data file directly - instead access is mediated via another process - this is particularly relevant for PHP since it typically runs as multiple independent processes with no sophisticated file locking functionality.
So if you were looking to implement IP to geography info yourself, I'd recommend sticking to a relational database or a nosql keystore (you don't need the geospatial stuff for forward lookups).
But do bear in mind that IP to geo lookup data is not nearly as accurate/precise as the peolpe selling the products would have you believe. If your objective is to get accurate position information about your users, the HTML5 geolocation API provides much better data - the problem is availability of the functionality on user's browsers.
Related
I am building a website for a client. The landing page has 4 areas of customizable content. The content is minimal, it's mainly just a reference to an image, an associated link, and an order...so 3 fields.
I am already using a lot of MySQL Tables for the other CMS related aspects, but for this one use, I am wondering if a database table really is the best option. The table would have only 4 records and there would be 3 columns. It's not going to be written too very often, just read from as the landing page loads.
Would I be better off sticking to a MySQL table for storing this minimal amount of information since it will fit into the [programming] workflow easy enough? Or would using an XML file that stores the information be a better way to go?
UPDATE: The end user (who knows nothing about databases) will be going through a web interface I create to choose one of the 4 items they want to update, then uploading an image from their computer, then selecting the link from a list of pages on the site (or offsite). The XML file or Database table will store the location of the image on the server and the link to wrap it in.
A database is the correct solution for storing dynamic data. However I agree it sounds like MySQL is overkill for this situation. It also means an entire other thing to administer and manage. But using a flat-file like XML is a bad idea too. Luckily SQLite is just the thing.
Let these snippets from the SQLite site encourage you:
Rather than using fopen() to write XML or some proprietary format into disk files used by your application, use an SQLite database instead. You'll avoid having to write and troubleshoot a parser, your data will be more easily accessible and cross-platform, and your updates will be transactional.
Because it requires no configuration and stores information in ordinary disk files, SQLite is a popular choice as the database to back small to medium-sized websites.
Read more here.
As for PHP code to interface with the database, use either PDO or the specific SQLite extension.
PDO is more flexible so I'd go with that and it should work out of the box - "PDO and the PDO_SQLITE driver is enabled by default as of PHP 5.1.0." (reference) Finding references and tutorials is super easy and just a search away.
If you use PHP the SimpleXML library could give you what you need and using XML in this fashion for something like you describe might be the simplest way to go. You don't have to worry about any MySQL configurations or worrying about the client messing something up. Restoring and maintaining an XML file might be easier too. The worry might be future scalability if you plan to grow the site much. That's my 2 cents anyway.
Here is my problem:
I have many known locations (I have no influence to these) with a lot of data. Each locations offers me in individual periods of a lot new data. Some give me differential updates, some just the whole dataset, some via xml, for some I have to build a webscraper, some need authentication etc...
These collected data should be stored in a database. I have to program an api to send requested data in xml back.
Many roads lead to Rome but which should i choose?
Which software would you suggest me to use?
I am familiar with C++,C#,Java,PHP,MySQL,JS but new stuff is still ok.
My idea is to use cron jobs + php (or shell script) + curl to fetch the data.
Then I need a module to parse and insert the data into a database (mysql).
The data requests from clients could answer a php script.
I think the input data volume is about 1-5GB/day.
The one correct answer doesn't exist, but can you give me some advice?
It would be great if you can show me smarter ways to do this.
Thank you very much :-)
LAMP: Stick to PHP and MySQL (and make occasional forays into perl/python): availability of PHP libraries, storage solutions, scalability and API solutions and its community size well makes up for any other environment offerings.
API: Ensure that the designed API queries (and storage/database) can meet all end-product needs before you get to writing any importers. Date ranges, tagging, special cases.
PERFORMANCE: If you need lightning fast queries for insanely large data sets, sphinx-search can help. It's got more than just text search (tags, binary, etc) but make sure you spec the server requirements with more RAM.
IMPORTER: Make it modular: as in, for each different data source, write a pluggable importer that can be enabled/disabled by admin, and of course, individually tested. Pick a language and library based on what's best and easiest fit for the job: bash script is okay.
In terms of parsing libraries for PHP, there are many. One of recent popular ones is simplehtmldom and I found it to work quite well.
TRANSFORMER: Make data transformation routines modular as well so it can be written as a need arises. Don't make the importer alter original data, just make it the quickest way into an indexed database. Transformation routines (or later plugins) should be combined with API query for whatever end result.
TIMING: There is nothing wrong with cron executions, as long as they don't become runaway or cause your input sources to start throttling or blocking you so you need that awareness.
VERSIONING: Design the database, imports, etc to where errant data can be rolled back easily by an admin.
Vendor Solution: Check out scraperwiki - they've made a business out of scraping tools and data storage.
Hope this helps. Out of curiosity, any project details to volunteer? A colleague of mine is interested in exchanging notes.
I am currently developing an ecommerce software using PHP/MySQL for a big company. There are two options for me to get some specificed data:
DB (for getting huge data, such as PRODUCTS, CATEGORIES, ORDERS, etc.)
TXT (using YAML -for getting analytical data and some options)
For instance, when a user go to product details page I need to get those TXT files:
Product summary file (product_hit, quantity_sold, etc.) -approximately max. 90KB
Langauge and Settings file (such as company_name, translations for template) -approximately max. 300KB
May be one more file (I don't know right know) -assume that 100KB.
I want to use this way, because data is easily readable by human and portable between programming languages. In addition, if I use DB, I need to connect a couple of tables. But these files GET THEM TOGETHER.
My txt file looks like (YAML):
product_id: 1281
quantity_sold: 12 #item(s)
hit: 1105
hit_avarage: 92 #quantity_sold/hit
vote: 2
...
But, still I am not sure about speed and performance. Using TXT files are good idea? Should I really use this way instead of DB?
As you can't partially include and parse a YAML file, you'll have to parse the file as a whole, which means that you'll have an incredible performance hit. You can compare this to selecting all rows from a database and then looping over them to find the one that you're looking for, instead of just typing a WHERE condition. So yes, a database is much faster to accomplish what you ask.
Please do take a look at Document Based Databases though, you don't necessarily have to use a relational database. In fact, when looking at the example of the YAML file, I think using a "no SQL" database would be a better alternative.
Cheers.
I love YAML and think it's great for smaller amounts of data, but the dimensions you mention are better dealt with using a database. It's faster, and data can be indexed - in a file based scenario, you would have to walk through the whole file to find something.
Use the YAML approach. The data structure suggests that they are tantamount to fixed data / configuration settings. And if you cannot reasonably do the calculations within the database, then don't attempt to.
You could however convert your fixed data from YAML to CSV, and import them within the database into a temporary table. If and only if calculating everything there is feasible.
Cannot say anything about performance. Technically reading file data is as slow as having the database read disk sectors, and the difference between YAML parsing and column splitting might not be significant. You'll have to test that.
YAML is 'human-readable data serialization format'.
Serialization is a process of converting in-memory structures into format that can be written, possibly transmitted and read into the in-memory structures.
Database management systems are programs that help control data management from creation through processing, including
security
scalability
concurrency
data integrity (atomicity, consistency, isolation and durability)
performance
availability
YAML does not provide tools and integrated environment that take care of the above and if you want to use it as a principal data store you either need to isolate all of the above challenges away from this particular scenario that would use YAML as principal data management system (or reinvent the wheels to certain extent, sooner or later).
I would imagine that no "e-commerce system for a big company" would want to sacrifice any of the above listed features for human readability.
I am in the planning stages of writing a CMS for my company. I find myself having to make the choice between saving page contents in a database or in folders on a file system. I have learned that PHP performs admirably well reading and writing to file systems, way better in fact than running SQL queries. But when it comes to saving pages and their data on a file system, there'll be a lot more involved than just reading and writing. Since pages will be drawn using a PHP class, the data for each page will be just data, no HTML. Therefore a parser for the files would have to be written. Also I doubt that all the data from a page will be saved in just one file, it would rather be saved in one directory, with content boxes and data in separated files.
All this would be done so much easier with MySQL, so what I want to ask you experts:
Will all the extra dilly dally with file system saving outweigh it's speed and resource advantage over MySQL?
Thanks for your time.
Go for MySQL. I'd say the only time you should think about using the file system is when you are storing files (BLOBS) of several megabytes, databases (at least the ones you typically use with a php website) are generally less performant when storing that kind of data. For the rest I'd say: always use a relational database. (Assuming you are dealing with data dat has relations of course, if it is random data there is not much benefit in using a relational database ;-)
Addition: If you define your own file-structure, and even your own way of cross referencing files you've already started building a 'database' yourself, that is not bad in itself -- it might be loads of fun! -- but you probably will not get the performance benefits you're looking for unless your situation is radically different than the other 80% of 'standard' websites on the web (a couple of pages with text and images on them). (If you are building google/youtube/flickr/facebook ... you've got a different situation and developing your own unique storage solution starts making sense)
things to consider
race-condition in file write if two user editing same piece of content
distribute file across multiple servers if CMS growth, latency on replication will cause data integrity problem
search performance, grep on files on multiple directory will be very slow
too many files in same directory will cause server performance especially in windows
Assuming you have a low-traffic, single-server environment hereā¦
If you expect to ever have to manage those entries outside of the CMS, my opinion is that it's much, much easier to do so with existing tools than with database access tools.
For example, there's huge value in being able to use awk, grep, sed, sort, uniq, etc. on textual data. Proxying that through a database makes this hard but not impossible.
Of course, this is just opinion based on experience.
S
Storing Data on the filesystem may be faster for large blobs that are always accessed as one piece of information. When implementing a CMS, you typically don't only have to deal with such blobs but also with structured information that has internal references (like content fields belonging to a certain page that has links to other pages...). SQL-Databases provide an easy way to access structured information, files on your filesystem do not (except of course simple hierarchical structures that can be represented with folders).
So if you wanted to store the structured data of your cms in files, you'd have to use a file format that allows you to save the internal references of your data, e.g. XML. But that means that you would have to parse those files, which is not only a lot of work but also makes the process of accessing the data slow again.
In short, use MySQL
Use a database and you have lots of important properties from the beginning "for free" without inventing them in some suboptimal ways if you go the filesystem way. If you don't want to be constrained to MySQL only you can make use of e.g. the database abstraction layer of the doctrine project.
Additionally you have tools like phpMyAdmin for easy lookup or manipulation of your data versus the texteditor.
Keep in mind that the result of your database queries can almost always be cached in memory or even in the filesystem so you have the benefit of easier management with well known tools and similar performance.
When it comes to minor modifications of website contents (eg. fixing a typo or updating external links), I find it much easier to connect to the server using SSH and use various tools (text editors, grep etc.) on files, rather than I having to use CMS interface to update each file manually (our CMS has such interface).
Yet there are several questions to analyze and answer, mentioned above - do you plan for scalability, concurrent modification of data etc.
No, it will not be worth it.
And there is no advantage to using the filesystem over a database unless you are the only user on the system (in which the advantage would be lost anyway). As soon as the transactions start rolling in and updates cascades to multiple pages and multiple files you will regret that you didn't used the database from the beginning :)
If you are set on using caching, experiment with some of the existing frameworks first. You will learn a lot from it. Maybe you can steal an idea or two for your CMS?
An SQL database is overkill if your storage needs are small. When I was young and dumb, I used a text file and flock()ed it when I needed to access it. This doesn't scale, but I still feel that non-database solutions have been completely ignored in Web 2.0.
Does anyone not use an SQL database for storage? What are the alternatives?
There are a lot of alternatives. But having SQLite which gives you SQL power combined with no fuss of file based storage, there is no need to look for these alternatives. SQLite is light enough to be used in cell phones and MP3 players, so I don't see how it could be considered an overkill.
So unless your application needs something very specific, don't bother. Most alternatives are a lot harder to use and have less performance.
SQLite is invented for this.
It's just a flat-file that contains a complete SQL database. You can query, update, insert, delete, there's little to no overhead in installation and all you need is the driver (which comes standard in PHP )
SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.
Kind of weird that nobody mentioned this already?
CouchDB (http://couchdb.apache.org/index.html) is a non-sql database, and seems to be a popular project these days, as well as Google's bigtable, or GT.M (http://sourceforge.net/projects/fis-gtm) which has been around forever.
Object databases abound as well; dbforobjects (http://www.db4o.com/), ZODB (http://www.zope.org/Products/StandaloneZODB), just to name a few.
All of these are supposedly faster and simpler than traditional SQL databases for certain use cases, but none approach the simplicity of a flat file.
A distributed hash table like google bigtable or hadoop is a simple and scalable non SQL database and often suits the websites far better than a SQL database. SQL is great for complex relational data, but most websites don't have this requirement. Most websites store and retrieve data in a few forms and don't need to run complex operations on the data.
Take a look at one of these solutions as they will provide all of the concurrent access that you need but don't subscribe to the traditional ideas of data normalisation. They can be thought of as pretty analogous to a bunch of named text files.
It probably depends how dynamic your web site is. I used wiki software once that used RCS to check in and out text files. I wouldn't recommend that solution for something that gets as many updates as StackOverflow or Wikipedia. The thing about database is that they scale well, and the database engine writers have figured out all the fiddly little details of simultaneous access, load balancing, replication, etc.
I would say that it doesn't depend on whether you store less or more information, it depends on how often you are requesting the stored data. Databasemanagers are superb on caching queries, so they are often the better choice performance wise. How ever, if you don't need a dynamic web page and are just loading static data - maybe a text file is the better option. Which format the data is stored in (i.e. XML, JSON, key=pair) doesn't matter - it's I/O operations that are performance heavy.
When I'm developing web applications, I always use a RDBMS as the primary data holder. If the web application don't need to serve dynamic data at every request, I simply apply a cache functionality storing the data in a cache file that gets requested when no new data have been added to the primary data source (the RDBMS).
I wouldn't choose whether to use an SQL database based on how much data I wanted to store - I would choose based on what kind of data I wanted to store and how it is to be used.
Wikipeadia defines a database as: A database is a structured collection of records or data that is stored in a computer system. And I think your answer lies there: If you want to store records such as customer accounts, access rights and so on then a DB such as mySQL or SQLite or whatever is not overkill. They give you a tried and trusted mechanism for managing those records.
If, on the other hand, your website stores and delivers unchanging file-based content such as PDFs, reports, mp3s and so on then simply storing them in a well-defined directory layout on a disk is more than enough. I would also include XML documents here: if you had for example a production department that created articles for a website in XML format there is no need to put them in a DB - store them on disk and use XSLT to deliver them.
Your choice of SQL or not will also depend on how the content you wish to store is to be retrieved. SQL is obviously good for retrieving many records based on search criteria whereas a directory tree, XML database, RDF database, etc are more likely to be used to retrieve single records.
Choice of storage mechanism is very important when trying to scale high-traffic site and stuffing everything into a SQL DB will quickly become a bottleneck.
It depends what you are storing. My blog uses Blosxom (written in Perl but a similar thing could be done for PHP) where each individual entry is a separate text file. The first line is plain text (the title) and the rest is unrestricted HTML. Following a few simple rules, these are rendered to form a simple but effective blogging framework.
It does have drawbacks but it also means that each post is a discrete file, which works well for updating on a local machine and then publishing to a remote web server. This is limited when it comes to efficient querying though, so certainly not a good choice if you want fine-grained control and web-based interaction with your data.
Check CouchDB.
I have used LINQ to XML as a data source in a .NET project. It was a small solution, and used caching to mitigate performance concerns. I would do it again for the quick site that just needs to keep data in a common place without increasing server requirements.
Depends on what you're storing and how you need to access it. Generally sql provides great reporting and manual management ability. Almost everything needs some way to manage what's stored and report on it.
In Perl I use DBM or Storable for such tasks. DBM will update automatically when variable is updated.
One level down from SQL databases is an ISAM (Indexed Sequential Access Method) - basically tables and indexes but no SQL and no explicit relationships among tables. As long as the conceptual basis fits your design, it will scale nicely. I've used Codebase effectively for a long time.
If you want to work with SQL-database-type data, then consider FileMaker.
A Simple answer is that you can use any data storage format, from standard defined, to database (which generally involved a protocol), even a bespoke file-format.
There are trade-offs for every choice you make in IT, and certainly websites are no different. In the early 2000's file-based forum systems were popular as it allows anyone with limited technical ability to edit pages and posts. Completely static sites swiftly become unmanageable and content does not benefit from upgrades to the site user-interface; however the site if coded correctly can simply be moved to a sub-directory, or ripped into the new design. CMS's and dynamic systems bring with them their own set of problems, namely that there does not yet exist a widely adopted standard for data storage amongst them; that they often rely on third-party plugins to provide features between design styles (despite their documentation advocating for separation of function & form).
In 2016, it's pretty uncommon not to use a standard storage mechanism, such as a *SQL RDBMS; although static site generators such as Jekyll (powers a lot of GitHub pages); and independent players such as October CMS still provision for static file-based storage.
My personal preference is to use an *SQL enabled RDBMS, it provides me syntax that is standardised at least at the vendor level, familiar and powerful syntax, but unlike a lot of people I don't think this is the only way, and in most cases would advocate for using a site-generator to save parts that don't have to be dynamic to a static store as this is the cheapest way to live on the web.
TLDR; it's up to you, SQL & RDBMS backed are popular.
Well, this is a bit of an open-ended question from the OP and there are two questions ... around SQL alternatives and non-SQL.
In general, in the "Why is SQL good" category ... it's a mature and robust standard that provides referential-integrity. Java JDBC supports it fully as do tools like TOAD and there a many SQL implementations such as SQL-Lite referenced earlier.
Now specific to a "for a web-site" is not particularly indicative of anything. Does a web-site need referential integrity? Maybe. If the business nature of the web-site is largely unstructured content, then one can consider any kind of persistent storage really from so called "no-SQL" databases like AWS DynamoDB to Mongo (not a fan though).
For managing the complexities of SQL stores - one suggestion versus a list of every persistence store ever created ... is AWS Aurora (part of RDS service). It is multi-region active-active and fully MySQL-compliant. JDBC/ODBC based driver frameworks would work out-of-the-box and it effectively offers "zero administration".
I would check out XML if I were you. See w3schools XML tutorial section on the left side. Tons of possibilities without using SQL database.