PHP Lucene - Indexation - Fails in Linux after 2.000.000 system blocks - php

I have been working on creating an index using Zend Framework latest version. The interface is working fine and everything else.
The problem I have now is the "re-indexation" or creation of the index. I have checked everything else, sanitizing the data and double checking the quality of the data.
The Process always stops at most likely record 15.000 and the limit on the index dir of 2.000.000. That I decided to build an application compiled in java with version lucene3.0.3 to run the indexation.
Fatal error: Uncaught exception 'Zend_Search_Lucene_Exception' with message 'Unsupported segments file format' in
It seems the latest format used by Zend Lucene is 2.3
Any ideas how to solve this problem, I really appreciate your input

I have no experience with this. But on the zend lucene website they state that the currently supported lucene index version is 2.3. It might be the case that version 3.0.3 is not fully supported.
[1] The currently supported Lucene index file format version is 2.3 (starting from Zend Framework 1.6).
See: http://framework.zend.com/manual/en/zend.search.lucene.java-lucene.html

I customized the example of this site http://www.techcrony.info/?p=33, this example reads text files from a data dir. So, the new customized functions need to read the info from the MySQL database:
public static void main(String[] args) throws Exception
{....System.out.print("Index dir arg_0 : " + indexDir + "\r");
String id ="%";
long start = new Date().getTime();
int numIndexed = index_main(indexDir, id);
long end = new Date().getTime();
System.out.print("End Program... \r");
}
private static int index_main(File indexDir, String id )throws IOException {
int numIndexed = 0;
try{
IndexWriter writer =
new IndexWriter(indexDir, new StandardAnalyzer(), true);
writer.setUseCompoundFile(false);
java.sql.Connection conn = linktodata();
int rowCount = 0;
...
As you can see I used the lucene-core-2.3.0.jar
javac -cp .:lucene-core-2.3.0.jar:mysql-connector-java-5.1.16-bin.jar Indexer.java
Run:
java -cp .:lucene-core-2.3.0.jar:mysql-connector-java-5.1.16-bin.jar Indexer /home/public_html/index_main
Now the most important question, is anyone aware if PHP lucene is able to manage more than 1.000.000 documents?

Related

How to discover php COM class

I want to do some stuff in some ms word documents using PHP (mostly search-replace), and I've discovered the COM class. But I have a question (because I'm also relative new on PHP), how can I found all methods related to word for example:
$objBookmark = $word->ActiveDocument->Bookmarks($bookmarkname);
$range = $objBookmark->Range;
How should I now about ActiveDocument, Bookmarks, Range...etc. Is there any way to have a list with all this?
To have autocomplete (InteliSense) you need a PHP class model and a good IDE. This will make your work much much easier.
One of the best IDE's I have come acrross is PHP Tools for Visual Studio. Look for it in the Visual Studio plugin gallery or in the Devsense website. You can use Visual Studio for FREE with the community edition.
To generate a PHP based class model you can either use the COM built in function:
com_print_typeinfo ( object $comobject [, string $dispinterface [, bool $wantsink = false ]] )
Or the NetPhp component (this link is to a step by step example with Word Interop) that will dump all types inside the binary to a complete PHP based class model:
http://www.drupalonwindows.com/en/blog/php-com-class-consuming-net-php
You can also use NetPhp to directly consume the OpenXML binaries and manipulate Word documents without Interop:
http://www.codeproject.com/Tips/994905/Edit-Word-Documents-using-OpenXML-and-Csharp-Witho

Mongo-PHP - MongoCursor exception with MongoDB PHP Driver v1.6

I'm having troubles with PHP MongoCursor since I upgraded to Mongo PHP Driver from 1.5.8 to 1.6.0
The following code works well with version 1.5.8, but crashes with version 1.6
PHP version is 5.5.21., Apache version is Apache/2.4.10 (Ubuntu)
$mongoClient = new \MongoClient($serverUrl, ['readPreference'=>\MongoClient::RP_NEAREST]);
$database = $mongoClient->selectDB($dbName);
$collection = $database->selectCollection($collectionName);
// count() works fine and returns the right nb on documents
echo '<br/>count returned '.$collection->count();
// find() exectues with no error...
$cursor = $collection->find();
$documents = [];
// ...and hasNext() crashes with the Excetion below
while($cursor->hasNext()){$documents[] = $cursor->getNext();}
return $documents;
And so the hasNext() call crashes with this message :
CRITICAL: MongoException: The MongoCursor object has not been correctly initialized by its constructor (uncaught exception)...
Am I doing something wrong ?
Thanks for you help !
This may be related to a bug that was introduced in 1.6.0 regarding iteration with hasNext() and getNext(): PHP-1382. A fix has since been merged to the v1.6 branch and should be released later this week as 1.6.1.
That said, the bug regarding hasNext() was actually that the last document in the result set would be missed while iterating. If I run your original script against 1.6.0, the array contains a null value as its last element. With the fix in place, the array will contain all documents as is expected. I cannot reproduce the exception you're seeing with either version.
That exception is actually thrown from an internal checks on the C data structures, to ensure that the cursor object is properly associated with a MongoClient and socket connection. See the MONGO_CHECK_INITIALIZED() macro calls in this file. Most all of the cursor methods check that a MongoClient is associated, but hasNext() is unique in that it also checks for the socket object (I believe other methods just assume a cursor with a MongoClient also has a socket). If that exception is truly reproducible for you and you're willing to do some debugging with the extension, I'd be very interested to know which of the two checks is throwing the error.
As a side note, you should also be specifying the "replicaSet" option when constructing MongoClient. This should have the replica set name, which ensures that the driver can properly ignore connections to hosts that are not a member of the intended replica set.
I just encountered the same issue; I refactored my code to use the cursor iterator instead, ie:
foreach( $cursor as $doc ) {
$documents[] = $doc;
}
I was looking for a code example of how to implement a tailable cursor and found this question. The following code is a simple example of a tailable cursor (via the $cursor variable) which you provide on a capped mongodb collection.
$cursor->tailable(true);
$cursor->awaitData(true);
while (true) {
if ($cursor->hasNext()) {
var_dump($cursor->getNext());
} else {
if ($cursor->dead()) {
break;
}
}
}

Error while accessing Boost interprocess Shared Memory Region using Apache web interface [duplicate]

I have written one map (key, value) using C++, Boost library in shared region.
void CreateIndexMap()
{
shared_memory_object::remove(Getsharedmemoryregion());
managed_shared_memory segment(create_only,Getsharedmemoryregion(), 10000000);
void_allocator alloc_inst (segment.get_segment_manager());
complex_map_type *mymap = segment.construct<complex_map_type>("MyMap")(std::less<char_string>(), alloc_inst);
}
Creating memory map in shared region:
void UpdateIndexMap(std::string str, std::string index, const char* SharedMemory)
{
managed_shared_memory segment(open_only,SharedMemory);
void_allocator alloc_inst (segment.get_segment_manager());
complex_map_type *mymap = segment.find<complex_map_type>("MyMap").first;
std::string h = ConvertTolowercase(str);
char_string patternvalue(h.c_str(), alloc_inst);
char_string indexvalue((index).c_str(), alloc_inst);
mymap->insert(std::pair<char_string, char_string>(patternvalue,indexvalue));
}
Now I am developing one web application using PHP and want to read map in shared region to get the data. How to implement it?
Ah, just noticed that other question was also by you.
You really do not want to complicate the matter by trying to embed the C++ code directly into PHP.
It is bound to be far simpler to find out why the child process spawned from a PHP page doesn't allow you to access shared memory. In the worst possible case, make the process very secure and just call setuid to force it to impersonate a certain user (assuming a UNIX-flavoured host). Do not setuid to root (this is a security no-no).

Where is the tutorial "processor" generated?

I just started working with the github hosted build of apache thrift, I am basically interested in a java client and PHP server implementation, but for now i am using the php server and client only
All nice and easy i made my thrift file
namespace php mobiledata
struct sms
{
1: string from,
2: string to,
3: string smstext,
4: string smsdatetime,
5: string smsdirection
}
struct smsdetails
{
1: list<sms> smsdata
}
service mobiledataservice
{
void insertsmsdata (1: smsdetails smslist)
}
And I generated the gen-php folder, which has got Types.php and mobiledataservice.php
the basic sample that comes with the github for php as server shows a line of code
$handler = new CalculatorHandler();
$processor = new \tutorial\CalculatorProcessor($handler);
I can't find this class "CalculatorProcessor" and certainly I don't have a comparative class generated in my gen_php like mobiledataprocessor, and it baffles me as to how I would run my server in absence of processor.
The server code is generated by calling
thrift -r -gen php:server tutorial.thrift
Note the :server part after -gen php, this triggers the processor generation.
These are all PHP options available:
php (PHP):
inlined: Generate PHP inlined files
server: Generate PHP server stubs
oop: Generate PHP with object oriented subclasses
rest: Generate PHP REST processors

mysql and php with restler for webservice

Does anyone know how to code restler to work with php and mysql to produce something like the following:
I want to create a XML API Web Service and not sure where to start.
I want people to be able to query the database for information such as the following using a http request.
Example of Data
BrandName
Price
ShortDescription
SKU
Example Query
http://website.com/productxml?dep=1&Count=3&BrandName=Y&Price=Y
How would I go about writing such a script as I have searched the internet and cant find any examples and was wondering if you can help.
Thanks in advance
Roy
You could use Restler (http://luracast.com/products/restler/) and build a method
class YourClass {
public function productxml($dep, $Count, $BrandName, $Price) {
// your MySQL stuff
}
}
which handles your request.
See the examples (http://help.luracast.com/restler/examples/) how this can be done.
Hope this helps.
Greets.
You could use Restler #Restler Luracast.
The development has increased alot and its stable.
The fun part about this framework is that it supports multiple formats. All these formats can be added by just inserting a single line of code:
require_once '../../../vendor/restler.php';
use Luracast\Restler\Restler;
$r = new Restler();
$r->setSupportedFormats('JsonFormat', 'XmlFormat'); <---- Add format here
$r->addAPIClass('BMI');
$r->handle();
Also I would like to refer to my Luracast Restler template on bitbucket its public and its there for everybody to see.
I combined Restler with Doctrine so catching data from databases has never been easier. Its a raw version for now but I'll update it soon.
My version uses vagrant. Its a extension to virtualisation technology that makes development setup easy and fast. Once your application is ready you can deploy it to your server.
Link:Restler+Doctrine
1) Install virtualbox + vagrant
2) Clone my repository
3) Move to the cloned directory.
4) vagrant up
5) Enjoy and start programming your REST API in less than 10 minutes.

Categories