I have a Solr box which is fed by a PHP cronjob right now.
I want to speed things up and save some memory by switching to a C++ process.
I don't want to reinvent the wheel by creating a new library.
The only thing is I can't find a library for Solr in C++.
Otherwise I will have to create one using CURL probably.
Does any of you guys know a library between for Solr written in C++?
Thanks.
There is an attempt to create a C++-API for Solr.
Take a look at this project:
http://code.google.com/p/solcpp/
With "fed" do you mean documents are passed for indexing? You'll probably find that the process that is doing the "feeding" is not the bottleneck but rather how quickly Solr can ingest documents.
I'd also recommend some profiling before you do a lot of work because the process is usually not CPU bound, so the speed increase you'll get by moving to C++ will disappointing.
Have you optimised your schema as much as possible? The two obvious first steps are:
1. Don't store data that is not needed for display.(Field ID's and meta data etc)
...and the opposite of that...
2. Don't index data is ONLY used for display, but not searched for. (Supplementary data)
And a quirky thing to try that sometimes works, and sometimes doesn't is changing the add/overwrite attribute to false.
<add overwrite="false">
This disables the unique id check (I think). So if you are doing a full wipe/replace of the index, and you are certain to only be adding unique documents, then this can speed up the import. It really depends on the size of the index though. If you have over 2,000,000 documents, and every time the indexer adds a new one, you gain a bit of speed by not forcing it to check if that document already exists. Not the most eloquent explanation, but I hope it makes sense.
Personally, I use the data import handler which cuts out the need for an intermediary script. It just hooks up to the db and sucks out the info it needs with a single query.
Related
I'm going to add simple live search to website (tips while entering text in input box).
Main task:
39k plain text lines for search into (~500 length of each line, 4Mb total size)
1k online users can simultaneously typing something in inputbox
In some cases 2k-3k resuts can match user request
I'm worried about the following questions:
Database VS textfile?
Are there any general rules or best practices related to my task aimed for decreasing db/server memory load? (caching/indexing/etc)
Do Sphinx/Solr are appropriate for such task?
Any links/advice will be extremely helpful.
Thanks
P.S. May be this is the best solution? PHP to search within txt file and echo the whole line
Put your data in a database (SQLite should do just fine, but you can also use a more heavy-duty RDBMS like MySQL or Postgres), and put an index on the column or columns that will be searched.
Only do the absolute minimum, which means that you should not use a framework, an ORM, etc. They will just slow down your code.
Create a PHP file, grab the search text and do a SELECT query using a native PHP driver, such as SQLite, MySQLi, PDO or similar.
Also, think about how the search box will work. You can prevent many requests if you e.g. put a minimum character limit (it does not make sense to search only for one or two characters), put a short delay between sending requests (so that you do not send requests that are never used), and so on.
Whether or not to use an extension such as Solr depends on your circumstances. If you have a lot of data, and a lot of requests, then maybe you should look into it. But if the problem can be solved using a simple solution then you should probably try it out before making it more complicated.
I have implemented 'live search' many times, always using AJAX with querying the database (MySQL) and haven't had/observed any speed or large load issues yet.
Anyway I saw an implementations using Solr but cannot suggest whether it was quicker or consumed less resources.
It completely depends on the HW the server will run on, IMO. As I wrote somewhere, I had seen a server with very slow filesystem so implementing live search while reading and parsing from txt files (or using Solr) could be slower than when querying the database. On the other hand You can host on poor shared webhosting with slow DB connection (that gets even slower with more concurrent connections) so this won't be the best solution.
My suggestion: use MySQL with AJAX (look at this jquery plugin or this article), set proper INDEXes on the searched columns and if this is found slow You still can move to a txt file.
In the past, i have used Zend search Lucene with great success.
It is a general purpose text search engine written entirely in PHP 5. It manages the indexing of your sources and is quite fast (in my experience). It supports many query types, search fields, search ranking.
I am currently working on an existing website that lists products, there are currently a little over 500 products.
The website has a text file for every product and I want to make a search option, thinking of reading all the text files and create an xml document with the values once a day that can be searched.
The client indicated that they wanted to add products and is used to add them using the text files. There might be over 5000 products in the future so I think it's best to do this with mysql. This means importing the current products and create a crud page for products.
Does anyone have experience with a PHP website that does not use MySQL? Is it possible to keep adding text files and just index them once a day even if it would mean having over 5000 products?
5000 seems like an amount that's still managable to index with a daily cron job. As long as you don't plan on searching them real-time, it should work. It's not ideal, but it would work.
Yes, it is very much possible, NOT plausible that you use files for these type of transactions.
It is also better to use XML instead of normal TXTs for the job. 5000 products with what kind of data associated to them might create problems in future.
PS
Why not MySQL?
Mysql was made because file based databases are slow and inaccurate.
Just use mysql. If you want to keep your old txt based database, just build an easy script that will import each file one by one and create corresponding tables in your sql database.
Good luck.
It's possible, however if this is a anything more than simply an online catalog, then managing transaction integrity is horrendously difficult - and that you're even asking the question implies that you are not in a good position to implement the kind of controls required. And as you've already discovered, it doesn't make for easy searching (BTW: mysql's fulltext indexing is a very blunt instrument - it's not a huge amount of effort to implement an effective search engine yourself - or there are excellent ones available off-the-shelf, e.g. mnogosearch)
(as a conicdental point, why XML? It makes managing the data much more complicated than it needs to be)
and create a crud page for products
Why? If the client wants to maintain the data via file uploads and you already need to port the data, then just use the same interface - where the data is stored is not relevant just now.
If there are issues with hosting+mysql, then using SQLite gives most of the benefits (although it wion't scale as well).
I am doing a small website project. In a page their is a section where the client posts new updates, at any given time there will be a maximum of 5 to 6 posts in this division. I was trying to create a MySQL database for the content. But I wonder if their is anyway I could have all the entries as XML files and use PHP to parse it. Is it possible ?
Which one is the better option MySQL or XML?
XML is a horrid piece of crap in my opinion. It's bloated and rather unpleasant to work with. However, it is a viable option as long as your number of entries and the amount of traffic stays small.
You can use SimpleXML to parse the XML, but the performance is going to degrade as file size increases. MySQL, however, will handle quite a lot of data before performance becomes a concern provided the schema is properly setup.
If you do use XML, you could always use a half-way XML solution. Like parse the file once, then store a serialized array of it.
Though really, if you're going to store it in a file of some sort, I would suggest, in order: SQLite, serialized array, JSON, XML. (Depending on your situation that order may change.)
If you abstract away the low level details enough, you should be able to make adapters that can be used interchangeably, thus allowing you to easily switch out storage backends. (On a large project, that would likely be unfeasible, but it sounds like your data storage/retrieval will remain fairly simple.)
I am creating a small plugin that process some images. Now to report the progress back to the user I have a small ajax script that will long poll the results back.
Now intern I need an object that keeps track of what is processed. Now the options I am aware of are the following.
Using the PHP session object. I cannot use this in this specific case, because the initial process is also done by ajax. So the main process is a ajax call, and the long poll ajax is another ajax call. They have 2 different session id's, so they do not communicate well.
Second option is to use the database as the storage. I do not know if this is so good because there will be around 40 read / writes on an average job. I know that this is no problem, but it seems a little much for something so simple.
What I actual look for is a sort of memory object if that is possible. Create a small object in the memory that is rapidly updated with the progress and deleted when we are done.
I do not know if that is possible, exists such a thing in PHP, and can I make use of this. Note that this will be a public plugin, so I need to work with methods that are available on all kind of systems, nothing special.
I think database is not the worse solution. If you think write in disk, maybe can be worse.
Memcache is good, but you need a "external plugins free" small plugin who runs easily on win, linux, mac, and so on... is not a good option.
If you use Mysql, you can use Memory engine tables, witch is fast, and truncate it or clean it periodically, with a simple garbage collector algoritm. And if memory table is not a option, innodb is good enough.
You can use memcache for this.
http://php.net/manual/en/book.memcache.php
As key you can use a md5 hash of the image file plus the users ip.
Have a look at Redis, "an open source, advanced key-value store", I think you will like it.
You need to run a Redis server, and access it with different clients. The client of choice for PHP is Predis. The usage is very simple:
$client = new Predis\Client($single_server);
$client->set('library', 'predis');
$retval = $client->get('library');
Database its not a bad idea when you are useing heap table. Sometimes you simply dont have memcache at server.
Check Memory Tables at MySQL documentation
You want easy, highly portable, shared memory between php processes regardless of how php is installed? use a mysql memory table. php without mysql installed is pretty rare.
So I'm going to be working on a home made blog system in PHP and I was wondering which way of storing data is the fastest. I could go in the MySQL direction, or I could go with my own little way of doing it which is storing all of the information (encoded in JSON) in files.
Which way would be the fastest, MySQL or JSON files?
For a small, single user 'database', a file system would likely be quicker - as the size and complexity grows, a database server like MySQL or SQL Server is hard to beat.
I would definately choose a DB option (as you need to be able to search and index stuff). But that does not mean you need a fully realized separate DB service.
MySQL is definitely the more scalable solution.
But the downside is you need to set up and maintain a separate service.
On the other hand there are DBs that are file based and still give you access with standard SQL (SQLite SQLite.org) jumps to mind. You get the advantages of SQL but you do not need to maintain a separate service. The disadvantage is that they are not as scalable.
I would choose a MySQL database - simply because it's easier to manage.
JSON is not really a format for storage, it's for sending data to JavaScripts. If you want to store data in files look into XML or Serialized PHP (which I suspect is what you are after, rather than JSON).
Forgive me if this doesn't answer your question very directly, but since it is a homecooked blog system is it really worth spending time thinking about what storage backend right now is faster?
You're not going to be looking at 10,000 concurrent users from day 1, it doesn't sound like it will need to scale to any maningful degree in the foreseeable future.
Why not just stick with MySQL as a sensible choice rather than a fast one? If you really want some sense that you designed for speed maybe bolt sqlite on instead.
Since you are thinking you may not have the need for a complex relational structure, this might be a fun opportunity to try something more down the middle.
Check out CouchDB, it is a document-based, schema free database (yet still indexable). The database is made of documents that contain named fields (think key-value pairs).
Have fun....
Though I don't know for certain, it seems to me that a MySQL database would be a lot faster, especially as the amount of data gets larger and larger.
Also, using MySQL with PHP is super easy, especially if you use an abstraction class like ezSQL. ezSQL makes working with a database really simple and I think you'd be creating more unnecessary work for yourself by going the home-brewed JSON direction.
I've done both. I like files for very simple problems and databases for complicated problems.
For file solutions, note these problems as the number of files increases:
1) Much more disk space is used than you might expect, because even tiny files use up a whole block. Blocks are fairly large on filesystems which support large drives.
2) Most filesystems get very slow when the number of files in a directory gets very large. My solution to this (assuming the names of the files are reasonably spread out across the alphabet) is to create a directory consisting of the first two letters of the filename. Thus, the file, "animal.txt" would be found at an/animal.txt. This works surprisingly well. If your filenames are not reasonable well-distributed across the alphabet, use some sort of hashing function to create the directories. Sounds a little crazy, but this can work very, very well, and I've used it for very fast solutions with tens of thousands of files.
But the file solutions really only fit sometimes. Unless you have a great reason to go with files, use a database.
This is really cool. It's a PHP class that controls a flat-file database with queries http://www.fsql.org/index.php
For blogs, I recommend caching the pages because blogs usually only have static content. This way, the queries only get run once while caching. You can update the cached pages when a new blog post is added.