I have thousands of photos on my site (each with a numeric PhotoID) and I have EXIF data (photos can have different EXIF tags as well).
I want to be able to store the data effectively and search it.
Some photos have more EXIF data than others, some have the same, so on..
Basically, I want to be able to query say 'Select all photos that have a GPS location' or 'All photos with a specific camera'
I can't use MySQL (it won't scale well with the massive data size). I thought about Cassandra, but I don't think it lets me query on fields. I looked at SimpleDB, but I would rather: not pay for the system, and I want to be able to run more advanced queries on the data.
Also, I use PHP and Linux, so it would be awesome if it could interface nicely to PHP.
Edit: I would prefer to stick with some form of NoSQL database.
Any ideas?
I also doubt that MySql would have any load problems, but have a look at CouchDB:
Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API.
Getting started with PHP and the CouchDB API.
CouchDB: The Definitive Guide
CouchDB basics for PHP developers
I would probably personally stick to MySQL, but if you are looking for a NoSQL style system you might want to look into Solr. That allows things like faceted searches (e.g. tells you how many of your current search result fit into each resolution / format / etc and lets you narrow your search that way).
Related
For a customer i'm working on a small project to index a bunch (around 30) Excel spreadsheets. Main goal of the project is to search fast through uploaded excel files. I've googled to find a solution but I didn't found an easy solution yet.
Some options I'm considering:
-Do something manually with PHPExcel and MySQL and store column information using meta tables. Use the FullText options of the table to return search results.
-Use a document store (like MongoDB) to store the files and combine this with ElasticSearch / Solr to get fast results.
-Combination of both, use Solr on the relational database.
I think the second option is a bit overkill, I don't want to spend to much time on this problem. I'd like to hear some opinions about this, other suggestions are welcome :)
I agree with the others. I've done several systems in the past that suck spreadsheets into a database. It is an excellent way of getting a familiar user interface without any programming. I've tended then to make use of email to get the spreadsheets a central location for reading either by MS Access and, in more recent years, read by PHP into a MySQL database.
PHP is particularly good as you can connect it easily to a mail server to automatically read and process the spreadsheets.
I'm learning web-centric programming by writing myself a blog, using PHP with a MySQL database backend. This should replace my current (Drupal based) blog.
I've decided that a post should contain some data: id, userID, title, content, time-posted. That makes a nice schema for a database table. I'm having issues deciding how I want to organize the storage of content, though.
I could either:
Use a file-based system. The database table content would then be a URL to a locally-located file, which I'd then read, format, and display.
Store the entire contents of the post in content, ie put it into the database.
If I went with (1), searching the contents of posts would be slightly problematic - I'd be limited to metadata searching, or I'd have to read the contents of each file when searching (although I don't know how much of a problem that'd be - grep -ir "string" . isn't too slow...). However, images (if any) would be referenced by a URL, so referencing content would at least be an internally consistant methodology, and I'd be easily able to reuse the content, as text files are ridiculously easy to work with, compared to an SQL database file.
Going with (2), though, I could use a longtext. The content would then need to be sanitised before I tried to put it into the tuple, and I'm limited by size (although, it's unlikely that I'd write a 4GB blog post ;). Searching would be easy.
I don't (currently) see which way would be (a) easier to implement, (b) easier to live with.
Which way should I go / how is this normally done? Any further pros / cons for either (1) or (2) would be appreciated.
For the 'current generation', implementing a database is pretty much your safest bet. As you mentioned, it's pretty standard, and you outlined all of the fun stuff. Most SQL instances have a fairly powerful FULLTEXT (or equivalent) search.
You'll probably have just as much architecture to write between the two you outlined, especially if you want one to have the feature-parity of the other.
The up-and-coming technology is a key/value store, commonly referred to as NoSQL. With this, you can store your content and metadata into separate individual documents, but in a structured way that makes searching and retrieval quite fast. Some common NoSQL engines are mongo, CouchDB, and redis (among others).
Ultimately this comes down to personal preference, along with a few use-case considerations. You didn't really outline what is important to you as far as conveniences and your application. Any one of these would be just fine for a personal or development blog. Building an entire platform with multiple contributors is a different conversation.
13 years ago I tried your option 1 (having external files for text content) - not with a blog, but with a CMS. And I ended in shoveling it all back into the database for easier handling. It's much easier to have global replaces on the database than on the text file level. With large numbers of post you run into trouble with directory sizes and access speed, or you have to manage subdirectory schemes etc. etc. Stick to the database only approach-
There are some tools to make your life easier with text files than the built-in mysql functions, but with a command line client like mysql and mysqldump you can easily extract any texts to the file system level, work on them with standard tools and re-load them into the database. What mysql really lacks is built-in support for regex search/replace, but even for that you'll find a patch if you're willing to recompile mysql.
I am building a website for a client. The landing page has 4 areas of customizable content. The content is minimal, it's mainly just a reference to an image, an associated link, and an order...so 3 fields.
I am already using a lot of MySQL Tables for the other CMS related aspects, but for this one use, I am wondering if a database table really is the best option. The table would have only 4 records and there would be 3 columns. It's not going to be written too very often, just read from as the landing page loads.
Would I be better off sticking to a MySQL table for storing this minimal amount of information since it will fit into the [programming] workflow easy enough? Or would using an XML file that stores the information be a better way to go?
UPDATE: The end user (who knows nothing about databases) will be going through a web interface I create to choose one of the 4 items they want to update, then uploading an image from their computer, then selecting the link from a list of pages on the site (or offsite). The XML file or Database table will store the location of the image on the server and the link to wrap it in.
A database is the correct solution for storing dynamic data. However I agree it sounds like MySQL is overkill for this situation. It also means an entire other thing to administer and manage. But using a flat-file like XML is a bad idea too. Luckily SQLite is just the thing.
Let these snippets from the SQLite site encourage you:
Rather than using fopen() to write XML or some proprietary format into disk files used by your application, use an SQLite database instead. You'll avoid having to write and troubleshoot a parser, your data will be more easily accessible and cross-platform, and your updates will be transactional.
Because it requires no configuration and stores information in ordinary disk files, SQLite is a popular choice as the database to back small to medium-sized websites.
Read more here.
As for PHP code to interface with the database, use either PDO or the specific SQLite extension.
PDO is more flexible so I'd go with that and it should work out of the box - "PDO and the PDO_SQLITE driver is enabled by default as of PHP 5.1.0." (reference) Finding references and tutorials is super easy and just a search away.
If you use PHP the SimpleXML library could give you what you need and using XML in this fashion for something like you describe might be the simplest way to go. You don't have to worry about any MySQL configurations or worrying about the client messing something up. Restoring and maintaining an XML file might be easier too. The worry might be future scalability if you plan to grow the site much. That's my 2 cents anyway.
recently i've used Maxmind geoip to locate country & city based on the ip. It has huge content inside the dat files. but retrieving of those records happens within a seconds. so i'm so curious to learn and use the technology in php.
First i've seen some video files are using this .dat extension files and now text information. so what is .dat extension actually? is it possible to read and write in php.
Thanks!
For what I know, dat extension means a generic file in which you could write what you need, in the format you please.
I mean, in every file you could do that, but generally if you find an xml file you assume that inside you find xml formatted text; on the contrary dat files are not recognized as something you can decode with a specific software if you don't know who and how wrote it.
The files will most likely be in a custom format that they developed; if it's open source you could reimplement it in PHP (if it isn't already written in PHP), or maybe access the data through an API.
The speed will come from the fact that it'll be indexed in some way, or it's like "for every record move 100 bytes further into the file".
There's a lot of questions here.
First, the file is a database - it stores data. There are lots of database models - relational, herarchical, object-oriented, vector, hypercube, keystore....there are implementations of all these available off the shelf.
Some databases are more apposite to managing particular data structures than others. Geospatial data is a common specialization - so much so that a lot of other database types will provide vector functionality (e.g. mysql and postgresql which are relational databases).
For most database systems, the application using the services of the database does not access the data file directly - instead access is mediated via another process - this is particularly relevant for PHP since it typically runs as multiple independent processes with no sophisticated file locking functionality.
So if you were looking to implement IP to geography info yourself, I'd recommend sticking to a relational database or a nosql keystore (you don't need the geospatial stuff for forward lookups).
But do bear in mind that IP to geo lookup data is not nearly as accurate/precise as the peolpe selling the products would have you believe. If your objective is to get accurate position information about your users, the HTML5 geolocation API provides much better data - the problem is availability of the functionality on user's browsers.
I'm working with a Postgres database that I have no control over the administration of. I'm building a calendar that deals with seeing if resources (physical items) were online or offline on a specific day. Unfortunately, if they're offline I can only confirm this by finding the resource name in a text field.
I've been using
select * from log WHERE log_text LIKE 'Resource Kit 06%'
The problem is that when we're building a calendar using LIKE 180+ times (at least 6 resources per day) is slow as can be. Does anybody know of a way to speed this up (keep in mind I can't modify the database). Also, if there's nothing I can do on the database end, is there anything I can do on the php end?
I think, that some form of cache will be required for this. As you cannot change anything in database, your only chance is to pull data from it and store it in some more accessible and faster form. This is highly dependent on frequency of data inserted into table. If there are more inserts than selects, it will not probably help much. Other way there is slight chance of improved performance.
Maybe you can consider using Lucene search engine, which is capable of fulltext indexing. There is implementation from Zend and even Apache has some http service. I haven't opportunity to test it however.
If you don't use something that robust, you can write your own caching mechanism in php. It will not be as fast as postgres, but probably faster than not indexed LIKE queries. If your queries need to be more sofisticated (conditions, grouping, ordering...), you can use SQLite database, which is file based and doesn't need extra service running on server.
Another way could be using triggers in database, which could on insert data store required information to some other table in more indexed manner. But without rights to administer database, it is probably dead end.
Please be more specific with your question, if you want more specific information.