Exploring search options for PHP - php

I have innoDB table using numerous foreign keys, but we just want to look up some basic info out of it.
I've done some research but still lost.
How can I tell if my host has Sphinx
installed already? I don't see it
as an option for table storage
method (i.e. innodb, myisam).
Zend_Search_Lucene, responsive
enough for AJAX functionality of
millions of records?
Mirror my
innoDB with a myisam? Make every
innodb transaction end with a write
to the myisam, then use 1:1 lookups?
How would I do this automagically?
This should make MyISAM
ACID-compliant and free(er) from
corruption no?
PostgreSQL fulltext
queries don't even look like SQL to
me wtf, I don't have time to learn a
new SQL syntax I need noob options
????????????????????
This is high volume site on a decently-equipped VPS
Thanks very much for any ideas.

Sphinx is very good choice. Very scalable, built-in clustering and sharding.

Your question is very vague on what you're actually wanting to accomplish here but I can tell you to stay away from Zend_Search_Lucene with record counts that high. In my experience (and many others, including Zend Certified Engineers) ZSL's performance on large record-sets is poor at best. Use a tool like Apache Lucene instead if you go that route.

Related

Database Design and Solr

We have a database of about 500 000 records that has non-normalized data (Vehicles for sale). We have a master MySQL DB and to enable fast searching, we update a Solr index whenever changes are made. Most of our data is served from the Solr index due to the complex nature of the joins and relationships in the MySQL DB.
We have started to run into problems with the speed and integrity of updates from within solr. When we push updates using softcommit we are find that it takes ~1 second for the changes to be visible. While it isn’t a big issue at the moment, we are concerned that the problem will get worse and we want to have a solution before we get there.
We are after some guidance on what solutions we should be looking at:
How big is our dataset in comparison to other solutions using Solr in
this manner?
We are only using 1 server for Solr at the moment. What is the split point to move to clustering and will that help or hinder our update problem?
One solution we have seen is using a NoSQL DB for some of the
data. Do NoSQL DBs have better performance on a record by record
level?
Are there some other options that might be worth looking into?
I'll answer your questions in sequence
1) No your dataset is not that huge. Anything below 1 million records is fine for solr.
2)Using 1 solr server is not a good option. Try SolrCloud, it is the best way to get a solr into High Availability and it will improve your performance
3)Both sql and nosql databases have their advantages and disadvantages. It depends on your dataset. In general nosql databases are faster.
4)I suggest go with SolrCloud.It is fast and reliable.

Is MongoDB + Socket.io better than MySQL + Socket.io for a real time app?

I am building a real-time app & am wondering if I should bother moving from MySQL to MongoDB. My app has a ton of writes happening, though the read cases are higher still. Am currently using XHR on the client-server side but am almost done moving to Socket.io too.
My research does make me want to move to MongoDB + Socket.io, but wanted to get some thoughts from the community.
update I am currently defining 'better' by a faster app, if that makes any sense. I am sort of able to live without sql, I 'think'. Currently using 0 JOINs etc. But I was trying to see if anybody out there had any experience moving from MySQL to MongoDB for a 'generic' real-time app.
Thank you.
It depends on how you define "better".
If the relational model and sets are more important to you, MySQL is "better" than MongoDB.
If you can give up ACID, and your data is more document based, MongoDB is "better" than MySQL.
It's difficult to answer in any case, but especially so without knowing more about your use cases.
Maybe late to add this, but I've had some experience with a real time analytics application moving from mySQL to mongodb. In my experience mySQL was much faster and responsive because my app was so WRITE HEAVY. By using innodb I was able to get row level locks, whereas when I was using mongodb I had to deal with a global lock. Not sure if they've since resolved the "global" lock thing.
My final solution was to use innodb and the Percona release of mySQL.
MongoDB isn't a relational database management system. Saying MongoDB is better than any RDBMS without first specifying the data shows a real lack of knowledge.
Key constraints don't exists in MongoDB. There are no referential integrity checks. We use MongoDB for blob storage and store the MongoDB keys in SQL Server. It's great for that but I would never outright replace a normalized SQL Server or a MySQL RDBMS instance with MongoDB.
Maybe a lot of people see MongoDB as being better because they don't want to worry about optimizing indexes and execution plans...but that comes at a cost of data integrity. It's a schemaless structure which means consistency doesn't exist. It will eat anything you feed it which can be dangerous.
Think about what happens over time as you add/remove properties from your JSON or add/remove reference data based on changing business rules. Think about what that conversion would look like in MongoDB compared to a RDBMS instance.
I know I sound pretty critical of MongoDB but I don't mean to. We use it and it works well for our needs but it's not a replacement for RDBMS. More of a supplement.

Should a forum posts table use MyISAM or InnoDB

For a forum, should I use MyISAM or InnoDB for the table that stores the text?
I know that MyISAM supports full text searching, which sounds like it's what I should go for, but while reading, I've come across this very confusing sentence.
You should use InnoDB when
transactions are important, such as
for situations in which INSERTs and
SELECTs are being interleaved, such as
online message boards or forums.
Well, but for an "online message boards or forums" I would need good search to search through the text of each post, so doesn't it make more sense to use MyISAM? Can someone clarify what's most commonly used?
Apart from MySQL's native full text searching solution, which as you already identified requires the MyISAM storage engine, you may also want to consider the popular third-party search engines Sphinx and Apache Lucene, with which you can use InnoDB tables.
You could also stick to MySQL's solution, and use InnoDB storage for all tables except for one table which will simply hold the post_id and text_body. This is assuming that you only require full text searching for the content of forum posts.
In addition note that while the major deficiency of MyISAM is the lack of transaction support, there are other important issues as well for a production system. You may want to check the following article for further reading:
MySQL Performance Blog: Using MyISAM in production
Generally speaking, for non-critical data, I'd use InnoDB tuned for speed. (E.g., I wouldn't flush binary logs on every commit.) I'd only use MyISAM for things like logging tables.
As others have mentioned, you can use external search indexes if you need to do full text searches. They are more powerful and faster than MySQL's.
Regarding the part about transactions, it's a bit of a lie. You could do the same thing with MyISAM by locking tables for WRITE. If you are inserting into, say, three tables when posting a message, you could lock all three tables for WRITE, and the data would look consistent (assuming all three INSERTS succeeded).
But the problem with that is you are locking the entire table, which can slow down concurrent usage. In fact, that is one of InnoDB's major advantages over MyISAM: it has row-level locking.
In short: if you are unable to set up an external indexer, and your site does not experience too much traffic, then MyISAM will work fine despite it being less robust. You can emulate the consistency of transactions with table locks. (I do not mean to make it sound like the table locking is somehow equivalent to transactions.)
Otherwise, I'd use InnoDB with an external indexing service.
For any large volume of text I'd use InnoDB to store it and Sphinx to index it. It provides a lot more search features than what's built into mysql.

Which DB/DB Engine supports search well?

I'm starting a site which relies heavily on search. While it's probably going to search basic meta data in the beginning, it might grow to something bigger in the future.
So which DB/DB Engine is best in your opinion when it comes to search performance and future scalability?
Appreciate your help
It depends on what you are searching.
If you doing a lot of text searching then you want more than just a database - you also want a search algorithm. You can find them around the web and they can use several databases as backends.
However, if you only simple text searches then MySQL MyISAM offers full-text searching which I use for small amounts of text (less than a few GB).
Other searches include using keys and indexes which might lead you to PostgreSQL for it's rock solid ACID compliance or MySQL with INNODB.
What is "search" ? What are you looking for and what kind of queries do you expect?
PostgreSQL is very powerfull, has full text search, btree, hash, gin and gist indexes. You can also configure you own types and operators, everything is there to optimize your searches in the database. It's up to you to use and tweak it for you situation.
PostgreSQL is easy to use with PHP, no problem at all. And it's free, sort of BSD-licence.
Depending on whatever you mean by “search” any database system might work. (MySQL is a well know and fast RDBMS).
If what you are looking for really is “full text search” then you should take a look into MySQL FULLTEXT indices (only usable with the MyISAM backend, IIRC), Lucene or Xapian.
The Zend Framework (written in PHP) has a ready adapter for lucene, see: http://devzone.zend.com/article/91

Scaling phpBB?

I'm looking to scale an existing phpBB installation by separating the read queries from the write queries to two separate, replicated MySQL servers. Anyone succeeded in doing this, specifically with phpBB?
The biggest concern I have so far is that it seems like the queries are scattered haphazardly throughout the code. I'd love to hear if anyone else did this, and if so, how it went / what was the process.
You could try MySQL Proxy which would be an easy way to split the queries without changing the application.
Just add more RAM. Enough RAM to hold the entire database. You'll be surprised how fast your inefficient script will fly. Memory forgives a lot of database scaling mistakes.
I know this was asked a long time ago, but I'd like to share what I experienced, in case it can help anyone.
If your problem are table locks, and knowing that the default storage engine of phpbb in that day was MyISAM, have you looked at moving to InnoDB storage engine?
Just find out which tables are most frequently locked, and convert those to InnoDB. The sessions table is the first candidate here, although you may want to look at other optimizations (such as storing session data only in memcache or something) if that is your main bottleneck.

Categories