How to read FoxPro Memo with PHP? - php

I have to convert .DBF and .FPT files from Visual FoxPro to MySQL. Right now my script works for .DBF files, it opens and reads them with dbase_open() and dbase_get_record_with_names() and then executes the MySQL INSERT commands.
However, some fields of these .DBF files are of type MEMO and therefore stored in a separate files ending in .FPT. How do I read this file?
I have found the specifications of this filetype in MSDN, but I don't know how I can read this file byte-wise with PHP (also, I would really prefer a simplier solution).
Any ideas?

Alright, I have carefully studied the MSDN specifications of DBF and FPT file structures and the outcome is a beautiful PHP class which can open a DBF and (optional) an FPT memo file at the same time. This class will give you record after record and thereby fetch any memos from the memo file - if opened.
class Prodigy_DBF {
private $Filename, $DB_Type, $DB_Update, $DB_Records, $DB_FirstData, $DB_RecordLength, $DB_Flags, $DB_CodePageMark, $DB_Fields, $FileHandle, $FileOpened;
private $Memo_Handle, $Memo_Opened, $Memo_BlockSize;
private function Initialize() {
if($this->FileOpened) {
fclose($this->FileHandle);
}
if($this->Memo_Opened) {
fclose($this->Memo_Handle);
}
$this->FileOpened = false;
$this->FileHandle = NULL;
$this->Filename = NULL;
$this->DB_Type = NULL;
$this->DB_Update = NULL;
$this->DB_Records = NULL;
$this->DB_FirstData = NULL;
$this->DB_RecordLength = NULL;
$this->DB_CodePageMark = NULL;
$this->DB_Flags = NULL;
$this->DB_Fields = array();
$this->Memo_Handle = NULL;
$this->Memo_Opened = false;
$this->Memo_BlockSize = NULL;
}
public function __construct($Filename, $MemoFilename = NULL) {
$this->Prodigy_DBF($Filename, $MemoFilename);
}
public function Prodigy_DBF($Filename, $MemoFilename = NULL) {
$this->Initialize();
$this->OpenDatabase($Filename, $MemoFilename);
}
public function OpenDatabase($Filename, $MemoFilename = NULL) {
$Return = false;
$this->Initialize();
$this->FileHandle = fopen($Filename, "r");
if($this->FileHandle) {
// DB Open, reading headers
$this->DB_Type = dechex(ord(fread($this->FileHandle, 1)));
$LUPD = fread($this->FileHandle, 3);
$this->DB_Update = ord($LUPD[0])."/".ord($LUPD[1])."/".ord($LUPD[2]);
$Rec = unpack("V", fread($this->FileHandle, 4));
$this->DB_Records = $Rec[1];
$Pos = fread($this->FileHandle, 2);
$this->DB_FirstData = (ord($Pos[0]) + ord($Pos[1]) * 256);
$Len = fread($this->FileHandle, 2);
$this->DB_RecordLength = (ord($Len[0]) + ord($Len[1]) * 256);
fseek($this->FileHandle, 28); // Ignoring "reserved" bytes, jumping to table flags
$this->DB_Flags = dechex(ord(fread($this->FileHandle, 1)));
$this->DB_CodePageMark = ord(fread($this->FileHandle, 1));
fseek($this->FileHandle, 2, SEEK_CUR); // Ignoring next 2 "reserved" bytes
// Now reading field captions and attributes
while(!feof($this->FileHandle)) {
// Checking for end of header
if(ord(fread($this->FileHandle, 1)) == 13) {
break; // End of header!
} else {
// Go back
fseek($this->FileHandle, -1, SEEK_CUR);
}
$Field["Name"] = trim(fread($this->FileHandle, 11));
$Field["Type"] = fread($this->FileHandle, 1);
fseek($this->FileHandle, 4, SEEK_CUR); // Skipping attribute "displacement"
$Field["Size"] = ord(fread($this->FileHandle, 1));
fseek($this->FileHandle, 15, SEEK_CUR); // Skipping any remaining attributes
$this->DB_Fields[] = $Field;
}
// Setting file pointer to the first record
fseek($this->FileHandle, $this->DB_FirstData);
$this->FileOpened = true;
// Open memo file, if exists
if(!empty($MemoFilename) and file_exists($MemoFilename) and preg_match("%^(.+).fpt$%i", $MemoFilename)) {
$this->Memo_Handle = fopen($MemoFilename, "r");
if($this->Memo_Handle) {
$this->Memo_Opened = true;
// Getting block size
fseek($this->Memo_Handle, 6);
$Data = unpack("n", fread($this->Memo_Handle, 2));
$this->Memo_BlockSize = $Data[1];
}
}
}
return $Return;
}
public function GetNextRecord($FieldCaptions = false) {
$Return = NULL;
$Record = array();
if(!$this->FileOpened) {
$Return = false;
} elseif(feof($this->FileHandle)) {
$Return = NULL;
} else {
// File open and not EOF
fseek($this->FileHandle, 1, SEEK_CUR); // Ignoring DELETE flag
foreach($this->DB_Fields as $Field) {
$RawData = fread($this->FileHandle, $Field["Size"]);
// Checking for memo reference
if($Field["Type"] == "M" and $Field["Size"] == 4 and !empty($RawData)) {
// Binary Memo reference
$Memo_BO = unpack("V", $RawData);
if($this->Memo_Opened and $Memo_BO != 0) {
fseek($this->Memo_Handle, $Memo_BO[1] * $this->Memo_BlockSize);
$Type = unpack("N", fread($this->Memo_Handle, 4));
if($Type[1] == "1") {
$Len = unpack("N", fread($this->Memo_Handle, 4));
$Value = trim(fread($this->Memo_Handle, $Len[1]));
} else {
// Pictures will not be shown
$Value = "{BINARY_PICTURE}";
}
} else {
$Value = "{NO_MEMO_FILE_OPEN}";
}
} else {
$Value = trim($RawData);
}
if($FieldCaptions) {
$Record[$Field["Name"]] = $Value;
} else {
$Record[] = $Value;
}
}
$Return = $Record;
}
return $Return;
}
function __destruct() {
// Cleanly close any open files before destruction
$this->Initialize();
}
}
The class can be used like this:
$Test = new Prodigy_DBF("customer.DBF", "customer.FPT");
while(($Record = $Test->GetNextRecord(true)) and !empty($Record)) {
print_r($Record);
}
It might not be an almighty perfect class, but it works for me. Feel free to use this code, but note that the class is VERY tolerant - it doesn't care if fread() and fseek() return true or anything else - so you might want to improve it a bit before using.
Also note that there are many private variables like number of records, recordsize etc. which are not used at the moment.

I replaced this code:
fseek($this->Memo_Handle, (trim($RawData) * $this->Memo_BlockSize)+8);
$Value = trim(fread($this->Memo_Handle, $this->Memo_BlockSize));
with this code:
fseek($this->Memo_Handle, (trim($RawData) * $this->Memo_BlockSize)+4);
$Len = unpack("N", fread($this->Memo_Handle, 4));
$Value = trim(fread($this->Memo_Handle, $Len[1] ));
this helped for me

Although not PHP, VFP is 1-based references and I think PHP is zero-based references so you'll have to decypher and adjust accordingly, but this works and hopefully you'll be able
to post your version of this portion when finished.
FILETOSTR() in VFP will open a file, and read the entire content into
a single memory variable as a character string -- all escape keys, high byte characters, etc, intact. You'll probably need to rely on an FOPEN(), FSEEK(), FCLOSE(), etc.
MemoTest.FPT was my sample memo table/file
fpt1 = FILETOSTR( "MEMOTEST.FPT" )
First, you'll have to detect the MEMO BLOCK SIZE used when the file was created. Typically this would be 64 BYTES, but per the link you had in your post.
The header positions 6-7 identify the size (VFP, positions 7 and 8). The first byte is the high-order
nBlockSize = ASC( SUBSTR( fpt1, 7, 1 )) * 256 + ASC( SUBSTR( fpt1, 8, 1 ))
Now, at your individual records. Wherever in your DBF structure has the memo FIELD (and you can have many per single record structure) there will be 4 bytes. In THE RECORD field, it identifies the "block" in the memo file where the content is stored.
MemoBytes = 4 bytes at your identified field location. These will be stored as ASCII from 0-255. This field is stored with the FIRST byte as low-order and the 4th byte as 256^3 = 16777216. The first "Block" ever used will be starting in position offset of 512 per the memo .fpt file spec that the header takes up positions 0-511.
So, if your first memo field has a content of "8000" where the 8 is the actual 0x08, not number "8" which is 0x38, and the zeros are 0x00.
YourMemoField = "8000" (actually use ascii, but for readability showing hex expected value)
First Byte is ASCII value * 1 ( 256 ^ 0 )
Second Byte is ASCII value * 256 (256 ^ 1)
Third Byte is ASCII value * 65536 (256 ^ 2)
Fourth Byte is ASCII value * 16777216 (256 ^ 3)
nMemoBlock = byte1 + ( byte2 * 256 ) + ( byte3 * 65536 ) + ( byte4 * 16777216 )
Now, you'll need to FSEEK() to the
FSEEK( handle, nMemoBlock * nBlockSize +1 )
for the first byte of the block you are looking for. This will point to the BLOCK header. In this case, per the spec, the first 4 bytes identify the Block SIGNATURE, the second 4 bytes is the length of the content. For these two, the bytes are stored with HIGH-BYTE first.
From your FSEEK(), its REVERSE of the nMemoBlock above with the high-byte. The "Byte1-4" here are from your FSEEK() position
nSignature = ( byte1 * 16777216 ) + ( byte2 * 65536 ) + ( byte3 * 256 ) + byte4
nMemoLength = ( byte5 * 16777216 ) + ( byte6 * 65536 ) + ( byte7 * 256 ) + byte8
Now, FSEEK() to the 9th byte (1st actual character of the data AFTER the 8 bytes of the header you just read for signature and memo length). This is the beginning of your data.
Now, read the rest of the content...
FSEEK() +9 characters to new position
cFinalMemoData = FREAD( handle, nMemoLength )
I know this isn't perfect, nor PHP script, but its enough of pseudo-code on hOW things are stored and hopefully gets you WELL on your way.
Again, PLEASE take into consideration as you are stepping through your debug process to ensure 0 or 1 offset basis. To help simplify and test this, I created a simple .DBF with 2 fields... a character field and a memo field, added a few records and some basic content to confirm all content, positions, etc.

I think it's unlikely there are FoxPro libraries in PHP.
You may have to code it from scratch. For byte-wise reading, meet fopen() fread() and colleagues.
Edit: There seems to be a Visual FoxPro ODBC driver. You may be able to connect to a FoxPro database through PDO and that connector. How the chances of success are, and how much work it would be, I don't know.

The FPT file contains memo data. In the DBF you have columns of type memo, and the information in this column is a pointer to the entry in the FPT file.
If you are querying the data from the table you only have to reference the memo column to get the data. You do not need to parse the data out of the FPT file separately. The OLE DB driver (or the ODBC driver if your files are VFP 6 or earlier) should just give you this information.
There is a tool that will automatically migrate your Visual FoxPro data to MySQL. You might want to check it out to see if you can save some time:
Go to http://leafe.com/dls/vfp
and search for "Stru2MySQL_2" for the tool to migrate the structures of the data and "VFP2MySQL Data Upload program" for tools to help with the migration.
Rick Schummer VFP MVP

You also might want to check the PHP dbase libraries.They work quite well with DBF files.

<?
class Prodigy_DBF {
private $Filename, $DB_Type, $DB_Update, $DB_Records, $DB_FirstData, $DB_RecordLength, $DB_Flags, $DB_CodePageMark, $DB_Fields, $FileHandle, $FileOpened;
private $Memo_Handle, $Memo_Opened, $Memo_BlockSize;
private function Initialize() {
if($this->FileOpened) {
fclose($this->FileHandle);
}
if($this->Memo_Opened) {
fclose($this->Memo_Handle);
}
$this->FileOpened = false;
$this->FileHandle = NULL;
$this->Filename = NULL;
$this->DB_Type = NULL;
$this->DB_Update = NULL;
$this->DB_Records = NULL;
$this->DB_FirstData = NULL;
$this->DB_RecordLength = NULL;
$this->DB_CodePageMark = NULL;
$this->DB_Flags = NULL;
$this->DB_Fields = array();
$this->Memo_Handle = NULL;
$this->Memo_Opened = false;
$this->Memo_BlockSize = NULL;
}
public function __construct($Filename, $MemoFilename = NULL) {
$this->Prodigy_DBF($Filename, $MemoFilename);
}
public function Prodigy_DBF($Filename, $MemoFilename = NULL) {
$this->Initialize();
$this->OpenDatabase($Filename, $MemoFilename);
}
public function OpenDatabase($Filename, $MemoFilename = NULL) {
$Return = false;
$this->Initialize();
$this->FileHandle = fopen($Filename, "r");
if($this->FileHandle) {
// DB Open, reading headers
$this->DB_Type = dechex(ord(fread($this->FileHandle, 1)));
$LUPD = fread($this->FileHandle, 3);
$this->DB_Update = ord($LUPD[0])."/".ord($LUPD[1])."/".ord($LUPD[2]);
$Rec = unpack("V", fread($this->FileHandle, 4));
$this->DB_Records = $Rec[1];
$Pos = fread($this->FileHandle, 2);
$this->DB_FirstData = (ord($Pos[0]) + ord($Pos[1]) * 256);
$Len = fread($this->FileHandle, 2);
$this->DB_RecordLength = (ord($Len[0]) + ord($Len[1]) * 256);
fseek($this->FileHandle, 28); // Ignoring "reserved" bytes, jumping to table flags
$this->DB_Flags = dechex(ord(fread($this->FileHandle, 1)));
$this->DB_CodePageMark = ord(fread($this->FileHandle, 1));
fseek($this->FileHandle, 2, SEEK_CUR); // Ignoring next 2 "reserved" bytes
// Now reading field captions and attributes
while(!feof($this->FileHandle)) {
// Checking for end of header
if(ord(fread($this->FileHandle, 1)) == 13) {
break; // End of header!
} else {
// Go back
fseek($this->FileHandle, -1, SEEK_CUR);
}
$Field["Name"] = trim(fread($this->FileHandle, 11));
$Field["Type"] = fread($this->FileHandle, 1);
fseek($this->FileHandle, 4, SEEK_CUR); // Skipping attribute "displacement"
$Field["Size"] = ord(fread($this->FileHandle, 1));
fseek($this->FileHandle, 15, SEEK_CUR); // Skipping any remaining attributes
$this->DB_Fields[] = $Field;
}
// Setting file pointer to the first record
fseek($this->FileHandle, $this->DB_FirstData);
$this->FileOpened = true;
// Open memo file, if exists
if(!empty($MemoFilename) and file_exists($MemoFilename) and preg_match("%^(.+).fpt$%i", $MemoFilename)) {
$this->Memo_Handle = fopen($MemoFilename, "r");
if($this->Memo_Handle) {
$this->Memo_Opened = true;
// Getting block size
fseek($this->Memo_Handle, 6);
$Data = unpack("n", fread($this->Memo_Handle, 2));
$this->Memo_BlockSize = $Data[1];
}
}
}
return $Return;
}
public function GetNextRecord($FieldCaptions = false) {
$Return = NULL;
$Record = array();
if(!$this->FileOpened) {
$Return = false;
} elseif(feof($this->FileHandle)) {
$Return = NULL;
} else {
// File open and not EOF
fseek($this->FileHandle, 1, SEEK_CUR); // Ignoring DELETE flag
foreach($this->DB_Fields as $Field) {
$RawData = fread($this->FileHandle, $Field["Size"]);
// Checking for memo reference
if($Field["Type"] == "M" and $Field["Size"] == 4 and !empty($RawData)) {
// Binary Memo reference
$Memo_BO = unpack("V", $RawData);
if($this->Memo_Opened and $Memo_BO != 0) {
fseek($this->Memo_Handle, $Memo_BO[1] * $this->Memo_BlockSize);
$Type = unpack("N", fread($this->Memo_Handle, 4));
if($Type[1] == "1") {
$Len = unpack("N", fread($this->Memo_Handle, 4));
$Value = trim(fread($this->Memo_Handle, $Len[1]));
} else {
// Pictures will not be shown
$Value = "{BINARY_PICTURE}";
}
} else {
$Value = "{NO_MEMO_FILE_OPEN}";
}
} else {
if($Field["Type"] == "M"){
if(trim($RawData) > 0) {
fseek($this->Memo_Handle, (trim($RawData) * $this->Memo_BlockSize)+8);
$Value = trim(fread($this->Memo_Handle, $this->Memo_BlockSize));
}
}else{
$Value = trim($RawData);
}
}
if($FieldCaptions) {
$Record[$Field["Name"]] = $Value;
} else {
$Record[] = $Value;
}
}
$Return = $Record;
}
return $Return;
}
function __destruct() {
// Cleanly close any open files before destruction
$this->Initialize();
}
}
?>

Related

Sphinx Query is returning null results in PHP

I am currently using PHP to query from my sphinx index. The index is building properly, however the search is not.
Originally I had my query set up like the following:
private function _doMapSearchAll($textSearch, $typeFilterArr, $rads = array(), $centre = array(), $showFavourites = false) {
// lookup the location based on the search and return the results in that
// map bounds ignore for boundary searches
$rsArr = array();
$skipDataSearch = false;
if (!COUNT($rads)) {
// handle airport code lookups if we have 3 characters
if (strlen($textSearch) === 3) {
$rsArr = $this->_doMapAirportSearch($textSearch);
if (COUNT($rsArr)) {
$skipDataSearch = true;
}
}
// still no results based on the above search, do generic map search
if ($skipDataSearch === false) {
$rsArr = $this->_doMapGeoSearch($textSearch, $typeFilterArr, $centre);
if (COUNT($rsArr['results']) > 0) {
$skipDataSearch = false;
}
}
}
// if we are doing a boundary search, or we have no results from the above,
// get all results based on our location
if ($skipDataSearch === false) {
// fall back onto searching via the data
// normalise the search string
$originalSearchString = $this->doNormalisation($textSearch);
// use sphinx for the search
$sphinx = $this->_getSphinxConnection();
$sphinx->setLimits(0, $this->container->getParameter('search_max_results'), $this->container->getParameter('search_max_results'));
if (COUNT($typeFilterArr)) {
$sphinx->SetFilter('fldSiteTypeUID_attr', $typeFilterArr);
}
// if we're doing a boundary search, skip the search string
if (COUNT($rads) > 0) {
$originalSearchString = '';
if ($showFavourites === false) {
//$sphinx->SetFilterFloatRange('lat_radians', min($rads['minLat'], $rads['maxLat']), max($rads['minLat'], $rads['maxLat']));
//$sphinx->SetFilterFloatRange('long_radians', min($rads['minLon'], $rads['maxLon']), max($rads['minLon'], $rads['maxLon']));
if ($rads['minLat'] > $rads['maxLat']) {
$rads['minLat'] = $rads['minLat'] - 360;
}
if ($rads['minLon'] > $rads['maxLon']) {
$rads['minLon'] = $rads['minLon'] - 180;
}
$sphinx->SetFilterFloatRange('lat_radians', $rads['minLat'], $rads['maxLat']);
$sphinx->SetFilterFloatRange('long_radians', $rads['minLon'], $rads['maxLon']);
}
}
// order by centre point
if (COUNT($centre) > 0) {
// otherwise start in the centre
$sphinx->SetGeoAnchor('lat_radians', 'long_radians', (float) $centre['centreLat'], (float) $centre['centreLon']);
$lintDistanceLimit = 999999999.0; // everything
if ($showFavourites === false && isset($rads['maxLat'])) {
$radiusMiles = $this->get('geolocation_helper')->getSeparation(
rad2deg($rads['maxLat']),
rad2deg($rads['minLon']),
rad2deg($rads['minLat']),
rad2deg($rads['maxLon']),
"M"
);
$lintDistanceLimit = $radiusMiles * 1609; // miles to meters...
}
$sphinx->SetFilterFloatRange('geodist', 0.0, (float)$lintDistanceLimit);
$sphinx->SetSortMode(SPH_SORT_EXTENDED, 'geodist ASC');
} else {
// apply search weights
$sphinx->SetFieldWeights(array('fldTown_str' => 100, 'fldAddress1_str' => 30, 'fldSiteName_str' => 20));
}
// if we should be limiting to only favourites, pickup the selected
// favourites from cookies
if ($showFavourites === true) {
$request = $this->container->get('request_stack')->getCurrentRequest();
$favourites = explode(',', $request->cookies->get('HSF_favlist'));
if (count($favourites)) {
foreach ($favourites as $k => $favourite) {
$favourites[$k] = (int)$favourite;
}
$sphinx->SetFilter('fldUID_attr', $favourites);
}
}
$rs = $sphinx->Query($originalSearchString, $this->container->getParameter('sphinx_index'));
echo json_encode($sphinx);
$rsArr['results'] = array();
if (isset($rs['matches']) && count($rs['matches'])) {
// update the search text to what we actually searched for, this
// is needed in case we've updated/set $rsArr['searchText']
// after a geolookup
$rsArr['searchText'] = $textSearch;
// clear any previous bounds set by the geolookup, we don't want this
// if we have direct text match results
$rsArr['geobounds'] = array();
if (isset($rs['matches']) && count($rs['matches'])) {
foreach ($rs['matches'] as $k => $match) {
$rsArr['results'][$k] = $this->_remapSphinxData($k, $match);
}
}
// sort the results by distance
usort($rsArr['results'], function ($a, $b) {
return $a['fldDistance'] - $b['fldDistance'];
});
}
}
// add on the total record count
$rsArr['total'] = (int) COUNT($rsArr['results']);
return $rsArr;
}
That was returning nothing and giving me the error for GEODIST():
{"_host":"sphinx","_port":36307,"_path":"","_socket":false,"_offset":0,"_limit":250,"_mode":0,"_weights":[],"_sort":4,"_sortby":"geodist ASC","_min_id":0,"_max_id":0,"_filters":[{"type":2,"attr":"geodist","exclude":false,"min":0,"max":999999999}],"_groupby":"","_groupfunc":0,"_groupsort":"#group desc","_groupdistinct":"","_maxmatches":"250","_cutoff":0,"_retrycount":0,"_retrydelay":0,"_anchor":{"attrlat":"lat_radians","attrlong":"long_radians","lat":0.9300859583877783,"long":-2.0943951023931953},"_indexweights":[],"_ranker":0,"_rankexpr":"","_maxquerytime":0,"_fieldweights":[],"_overrides":[],"_select":"*","_error":"searchd error: geoanchor is deprecated (and slow); use GEODIST() expression","_warning":"","_connerror":false,"_reqs":[],"_mbenc":"","_arrayresult":false,"_timeout":0}{"results":[],"bounds":{"bllat":53.395603,"trlat":53.7159857,"bllng":-113.7138017,"trlng":-113.2716433},"geocentre":{"lat":53.5461245,"lon":-113.4938229},"checksum":"204a43923452936b00a10c8e566c4a48d4fdb280f97fd4042646eb45c8257bbc","searchText":"Edmonton, AB, Canada","skipResults":true,"showFavourites":false}
To fix this I changed the SetGeoAnchor function to the following:
private function _doMapSearchAll($textSearch, $typeFilterArr, $rads = array(), $centre = array(), $showFavourites = false) {
// lookup the location based on the search and return the results in that
// map bounds ignore for boundary searches
$rsArr = array();
$skipDataSearch = false;
if (!COUNT($rads)) {
// handle airport code lookups if we have 3 characters
if (strlen($textSearch) === 3) {
$rsArr = $this->_doMapAirportSearch($textSearch);
if (COUNT($rsArr)) {
$skipDataSearch = true;
}
}
// still no results based on the above search, do generic map search
if ($skipDataSearch === false) {
$rsArr = $this->_doMapGeoSearch($textSearch, $typeFilterArr, $centre);
if (COUNT($rsArr['results']) > 0) {
$skipDataSearch = false;
}
}
}
// if we are doing a boundary search, or we have no results from the above,
// get all results based on our location
if ($skipDataSearch === false) {
// fall back onto searching via the data
// normalise the search string
$originalSearchString = $this->doNormalisation($textSearch);
// use sphinx for the search
$sphinx = $this->_getSphinxConnection();
$sphinx->setLimits(0, $this->container->getParameter('search_max_results'), $this->container->getParameter('search_max_results'));
if (COUNT($typeFilterArr)) {
$sphinx->SetFilter('fldSiteTypeUID_attr', $typeFilterArr);
}
// if we're doing a boundary search, skip the search string
if (COUNT($rads) > 0) {
$originalSearchString = '';
if ($showFavourites === false) {
//$sphinx->SetFilterFloatRange('lat_radians', min($rads['minLat'], $rads['maxLat']), max($rads['minLat'], $rads['maxLat']));
//$sphinx->SetFilterFloatRange('long_radians', min($rads['minLon'], $rads['maxLon']), max($rads['minLon'], $rads['maxLon']));
if ($rads['minLat'] > $rads['maxLat']) {
$rads['minLat'] = $rads['minLat'] - 360;
}
if ($rads['minLon'] > $rads['maxLon']) {
$rads['minLon'] = $rads['minLon'] - 180;
}
$sphinx->SetFilterFloatRange('lat_radians', $rads['minLat'], $rads['maxLat']);
$sphinx->SetFilterFloatRange('long_radians', $rads['minLon'], $rads['maxLon']);
}
}
// order by centre point
if (COUNT($centre) > 0) {
// otherwise start in the centre
// $sphinx->SetGeoAnchor('lat_radians', 'long_radians', (float) $centre['centreLat'], (float) $centre['centreLon']);
$centreLat = (float) $centre['centreLat'];
$centreLon = (float) $centre['centreLon'];
$sphinx->SetSelect("GEODIST(lat_radians, long_radians, $centreLat, $centreLon) AS geodist");
$lintDistanceLimit = 999999999.0; // everything
if ($showFavourites === false && isset($rads['maxLat'])) {
$radiusMiles = $this->get('geolocation_helper')->getSeparation(
rad2deg($rads['maxLat']),
rad2deg($rads['minLon']),
rad2deg($rads['minLat']),
rad2deg($rads['maxLon']),
"M"
);
$lintDistanceLimit = $radiusMiles * 1609; // miles to meters...
}
$sphinx->SetFilterFloatRange('geodist', 0.0, (float)$lintDistanceLimit);
$sphinx->SetSortMode(SPH_SORT_EXTENDED, 'geodist ASC');
} else {
// apply search weights
$sphinx->SetFieldWeights(array('fldTown_str' => 100, 'fldAddress1_str' => 30, 'fldSiteName_str' => 20));
}
// if we should be limiting to only favourites, pickup the selected
// favourites from cookies
if ($showFavourites === true) {
$request = $this->container->get('request_stack')->getCurrentRequest();
$favourites = explode(',', $request->cookies->get('HSF_favlist'));
if (count($favourites)) {
foreach ($favourites as $k => $favourite) {
$favourites[$k] = (int)$favourite;
}
$sphinx->SetFilter('fldUID_attr', $favourites);
}
}
$rs = $sphinx->Query($originalSearchString, $this->container->getParameter('sphinx_index'));
echo json_encode($sphinx);
$rsArr['results'] = array();
if (isset($rs['matches']) && count($rs['matches'])) {
// update the search text to what we actually searched for, this
// is needed in case we've updated/set $rsArr['searchText']
// after a geolookup
$rsArr['searchText'] = $textSearch;
// clear any previous bounds set by the geolookup, we don't want this
// if we have direct text match results
$rsArr['geobounds'] = array();
if (isset($rs['matches']) && count($rs['matches'])) {
foreach ($rs['matches'] as $k => $match) {
$rsArr['results'][$k] = $this->_remapSphinxData($k, $match);
}
}
// sort the results by distance
usort($rsArr['results'], function ($a, $b) {
return $a['fldDistance'] - $b['fldDistance'];
});
}
}
// add on the total record count
$rsArr['total'] = (int) COUNT($rsArr['results']);
return $rsArr;
}
This gives me back results however they look like this:
{"_host":"sphinx","_port":36307,"_path":"","_socket":false,"_offset":0,"_limit":250,"_mode":0,"_weights":[],"_sort":4,"_sortby":"geodist ASC","_min_id":0,"_max_id":0,"_filters":[{"type":2,"attr":"geodist","exclude":false,"min":0,"max":999999999}],"_groupby":"","_groupfunc":0,"_groupsort":"#group desc","_groupdistinct":"","_maxmatches":"250","_cutoff":0,"_retrycount":0,"_retrydelay":0,"_anchor":[],"_indexweights":[],"_ranker":0,"_rankexpr":"","_maxquerytime":0,"_fieldweights":[],"_overrides":[],"_select":"GEODIST(lat_radians, long_radians, 0.93008595838778, -2.0943951023932) AS geodist","_error":"","_warning":"","_connerror":false,"_reqs":[],"_mbenc":"","_arrayresult":false,"_timeout":0}{"results":[[null,null,null,null,null,null,null,"Other","http:\/\/localhost:8080\/themes\/docker\/images\/Site-Type-Other.png",null,34362776,null,""],[null,null,null,null,null,null,null,"Other","http:\/\/localhost:8080\/themes\/docker\/images\/Site-Type-Other.png",null,38279990,null,""],[null,null,null,null,null,null,null,"Other","http:\/\/localhost:8080\/themes\/docker\/images\/Site-Type-Other.png",null,7963188,null,""],[null,null,null,null,null,null,null,"Other","http:\/\/localhost:8080\/themes\/docker\/images\/Site-Type-Other.png",null,7971966,null,""],[null,null,null,null,null,null,null,"Other","http:\/\/localhost:8080\/themes\/docker\/images\/Site-Type-Other.png",null,31790051,null,""],[null,null,null,null,null,null,null,"Other","http:\/\/localhost:8080\/themes\/docker\/images\/Site-Type-Other.png",null,7972301,null,""],[null,null,null,null,null,null,null,"Other","http:\/\/localhost:8080\/themes\/docker\/images\/Site-Type-Other.png",null,33589292,null,""],[null,null,null,null,null,null,null,"Other","http:\/\/localhost:8080\/themes\/docker\/images\/Site-Type-Other.png",null,33589642,null,""],[null,null,null,null,null,null,null,"Other","http:\/\/localhost:8080\/themes\/docker\/images\/Site-Type-Other.png",null,7962913,null,""],[null,null,null,null,null,null,null,"Other","http:\/\/localhost:8080\/themes\/docker\/images\/Site-Type-Other.png",null,31789941,null,""],[null,null,null,null,null,null,null,"Other","http:\/\/localhost:8080\/themes\/docker\/images\/Site-Type-Other.png",null,7962178,null,""],[null,null,null,null,null,null,null,"Other","http:\/\/localhost:8080\/themes\/docker\/images\/Site-Type-Other.png",null,49484181,null,""],[null,null,null,null,null,null,null,"Other","http:\/\/localhost:8080\/themes\/docker\/images\/Site-Type-Other.png",null,31795436,null,""],
.....
My searchd config looks like this:
searchd
{
listen = 36307
listen = 9306:mysql41
log = /opt/sphinx/searchd.log
query_log = /opt/sphinx/query.log
read_timeout = 5
max_children = 30
pid_file = /opt/sphinx/searchd.pid
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
binlog_path = /opt/sphinx/
}
I am new tp Sphinx and need help correcting this query. Any thoughts?
Edit: This is sphinx version 3.4.1
So after hours on this over the past few days, I have found the answer...:
I was missing the * in the statement:
$sphinx->SetSelect("*, GEODIST($centreLat, $centreLon, lat_radians, long_radians) as geodist");
I hope this helps anyone who has this issue in the future

Can't get actual filesize

I'm working on a class whose purpose is to restrict users to making only 10 requests within any 30 second period. It utilizes a file to maintain IP addresses, last request time. and the number of tries they've made. The problem is that, no matter what I try, I can't get the filesize. I've tried using clearstatcache(), and I've tried using a function I found in the comments on the filesize() page of the PHP manual.
Here's the code, in it's current debugging state.
// Makes sure user can only try to generate a coupon x number of times over x amount of seconds
class IpChecker{
const WAIT_TIME = 30; //seconds until user can try again
const MAX_TRIES = 10; // maximum tries
const COUPON_IP = 0;
const COUPON_TIME = 1;
const COUPON_TRIES = 2;
private $ip_data;
private $path;
private $fh;
private $safe;
public function __construct(){
clearstatcache();
$this->path = realpath(dirname(__FILE__))."/ips/.ips";
$this->fh = fopen($this->path,'w+');
$this->filesize = $this->realfilesize($this->fh);
echo "fs: ".$this->filesize; exit;
$this->getIPs();
$this->checkIP();
$this->logRequest();
fclose($this->fh);
$this->safe || die(json_encode("You have exhausted all available tries. Please try again later."));
}
private function logRequest(){
$str = "";
foreach($this->ip_data as $data){
foreach($data as $col){
if(self::WAIT_TIME < (time() - $col[self::COUPON_TIME])) $str .= $col."\t";
}
$str = rtrim($str, '\t');
$str .= "\n";
}
$str = rtrim($str, '\n');
try{
$fw = fwrite($this->fh, $str) || die(json_encode("Unable to check IP"));
}catch(Exception $e){
die(json_encode($e));
}
}
private function checkIP(){
$IP = $_SERVER['REMOTE_ADDR'];
$TIME = time();
$safe = true;
$user_logged = false;
echo "<pre>"; var_dump($this->ip_data); exit;
foreach($this->ip_data as $key=>$data){
echo "<prE>"; var_dump($data); exit;
// if($data[$key][self::COUPON_IP] == $IP){
// $user_logged = true;
// if(
// (($TIME - $data[$key][self::COUPON_TIME]) < self::WAIT_TIME) ||
// (self::MAX_TRIES >= $data[$key][self::COUPON_TRIES])
// ) $safe = false;
// $this->ip_data[$key][self::COUPON_TRIES] = $this->ip_data[$key][self::COUPON_TRIES]+1;
// $this->ip_data[$key][self::COUPON_TIME] = $TIME;
// }
}
if(!$user_logged){
die("user not logged");
$this->ip_data[] = array(
self::COUPON_IP => $IP,
self::COUPON_TIME => $TIME,
self::COUPON_TRIES => 1
);
}
$this->safe = $safe;
}
private function getIPs(){
$IP_DATA = array();
echo file_get_contents($this->path); exit;
// this always returns 0.
$size = filesize($this->path);
echo "filesize: ".$size; exit;
if($size){
$IPs = fread($this->fh,$size);
$IP_ARR = explode("\n",$IPs);
foreach($IP_ARR as $line) $IP_DATA[] = explode("\t",$line);
}
$this->ip_data = $IP_DATA;
}
// Copied from the comments in the PHP Manual for filesize()
public function realfilesize($fp) {
$return = false;
if (is_resource($fp)) {
if (PHP_INT_SIZE < 8) {
// 32bit
if (0 === fseek($fp, 0, SEEK_END)) {
$return = 0.0;
$step = 0x7FFFFFFF;
while ($step > 0) {
if (0 === fseek($fp, - $step, SEEK_CUR)) {
$return += floatval($step);
} else {
$step >>= 1;
}
}
}
} elseif (0 === fseek($fp, 0, SEEK_END)) {
// 64bit
$return = ftell($fp);
}
}
return $return;
}
}
How can I get the real filesize? I'm on PHP 5.2.
I wasn;t able to figure out what was wrong with my code, if anythign, but I worked around it like this:
/**
* Makes sure user can only access a given page
* x number of times over x amount of seconds.
* If multiple instances of this class are used, the $namespace
* properties for each should be unique.
* Default wait time is 90 seconds while default request limit
* is 10 tries.
*
* -+ Usage +-
* Just create the object with any parameters, no need to assign,
* just make sure it's initialized at the top of the page:
* new RequestThrottler;
*
* -+- Parameters -+-
* null RequestThrottler ( [ string $namespace [, int $WaitTime [, int $MaxTries ] ] ] )
*/
class RequestThrottler{
// settings
private static $WAIT_TIME; // seconds until count expires
private static $MAX_TRIES; // maximum tries
// used to keep session variables unique
// in the event that this class is used in multiple places.
private $namespace;
// for debugging
const DBG = false;
// array index constants
const _TIME = 0;
const _TRIES = 1;
// defines whether or not access is permitted
private $safe;
// seconds until reset
private $secs;
/**
* -+- Constructor -+-
* #param String $namespace - A unique prefix for SESSION data
* #param Int $WaitTime - Total seconds before user can try again
* #param Int $MaxTries - Total tries user can make until their request is denied
*/
public function __construct($namespace='Throttler', $WaitTime=90, $MaxTries=10){
// make sure a session is available
if(!headers_sent() && !isset($_SESSION)) session_start();
if(!isset($_SESSION)) die(json_encode("No session available"));
// save settings
$this->namespace = $namespace;
self::$MAX_TRIES = $MaxTries;
self::$WAIT_TIME = $WaitTime;
// do the footwork
$this->checkHistory();
// if set to debug mode, print a short helpful string
if(self::DBG) die(json_encode(
"You are ".($this->safe ? 'SAFE' : 'NOT SAFE')."! "
. "This is try number {$_SESSION[$this->namespace.'_ATTEMPTS'][self::_TRIES]} of ".self::$MAX_TRIES.". "
. $this->secs." seconds since last attempt. "
. (self::$WAIT_TIME - $this->secs)." seconds until reset."
));
// if not safe, kill the script, show a message
$this->safe || die(json_encode("You're going too fast. Please try again in ".(self::$WAIT_TIME - $this->secs)." seconds."));
}
/**
* -+- checkHistory -+-
* Does the footwork to determine whether
* or not to throttle the current user/request.
*/
private function checkHistory(){
$TIME = time();
$safe = true;
// make sure session is iniitialized
if( !isset($_SESSION[$this->namespace.'_ATTEMPTS']) ||
!isset($_SESSION[$this->namespace.'_ATTEMPTS'][self::_TRIES]) ||
!isset($_SESSION[$this->namespace.'_ATTEMPTS'][self::_TIME]) )
$_SESSION[$this->namespace.'_ATTEMPTS'] = array(
self::_TIME =>$TIME,
self::_TRIES => 1
);
else $_SESSION[$this->namespace.'_ATTEMPTS'][self::_TRIES] =
$_SESSION[$this->namespace.'_ATTEMPTS'][self::_TRIES]+1;
// get seconds since last attempt
$secondSinceLastAttempt = $TIME - $_SESSION[$this->namespace.'_ATTEMPTS'][self::_TIME];
// reset the counter if the wait time has expired
if($secondSinceLastAttempt > self::$WAIT_TIME){
$_SESSION[$this->namespace.'_ATTEMPTS'][self::_TIME] = $TIME;
$_SESSION[$this->namespace.'_ATTEMPTS'][self::_TRIES] = 1;
$secondSinceLastAttempt = 0;
}
// finally, determine if we're safe
if($_SESSION[$this->namespace.'_ATTEMPTS'][self::_TRIES] >= self::$MAX_TRIES) $safe=false;
// log this for debugging
$this->secs = $secondSinceLastAttempt;
// save the "safe" flag
$this->safe = $safe;
}
}

Python socket recv (client-side)

i have to connect to an api via socket and send / recv some data.
the company send me a php-example-file with this code for reading data from the socket:
function readAnswer() {
$size = fgets($this->socketPtr, 64);
$answer = "";
$readed = 0;
while($readed < $size) {
$part = fread($this->socketPtr, $size - $readed);
$readed += strlen($part);
$answer .= $part;
}
return $answer;
}
This works for me. But in python i get from times to times an error.
not everything from the socket is recv.
my python try looks like this:
def read_answer(self,the_socket,timeout=0.5):
the_socket.setblocking(0)
total_data=[]
data=''
begin=time.time()
while 1:
if total_data and time.time()-begin > timeout:
break
elif time.time()-begin > timeout*2:
break
try:
data = the_socket.recv(8192)
if data:
total_data.append(data)
begin=time.time()
else:
time.sleep(0.1)
except:
pass
return ''.join(total_data)
i recv data as a dict / array. and from time to time i only get a int (msg length i think)
so what would be a better way to read the data from socket.
ah the api sends the data in a correct way, i checked this. it's only this little function ;(
After using the code below (thanks falsetru) and added a readed=len(data) i run into another problem:
this is the working php code:
function _parse_answer($answerData)
{
$result = array();
$lines = explode("\n", $answerData);
$data = explode("&", $lines[0]);
foreach($data as $piece)
{
$keyval = explode("=", $piece, 2);
$result[$keyval[0]] = $keyval[1];
}
for($i=1;$i<count($lines);$i++)
{
$result["csv"][]=$lines[$i];
}
return $result;
}
and this my crappy python code:
def parse_answer(self,data):
#print "dd_demo_api: answer: (%s)" % (data)
if data:
result = {}
lines = data.split("\n")
index_list = 0
if len(lines) == 1:
index_list = 0
else:
index_list = 1
pieces = lines[index_list].split("&")
for x in pieces:
keyval = x.split("=")
result[keyval[0]] = keyval[1]
iterlines = iter(lines)
next(iterlines)
next(iterlines)
count = 1
result["csv"] = {}
for x in iterlines:
result["csv"][count] = x.split(";")
return result
else:
return 0
i think here is some optimization required? ;(
Python version does not do the same thing with PHP version.
Try following code:
def read_answer(self, sock):
size = int(sock.recv(64).strip().rstrip('\0'))
# Above is not exactly same as `fgets`.
# If that causes an issue, use following instead.
#
# f = sock.makefile('r')
# size = int(f.readline(64).rstrip('\0'))
#
# and replace `sock.recv(n)` with `f.read(n)` in the following loop.
total_data = []
readed = 0
while readed < size:
data = sock.recv(size - readed)
if data:
total_data.append(data)
readed += len(data)
return b''.join(total_data)

Programmatically reading messages from systemd's journal

Update, 2013-09-12:
I've dug a bit deeper into systemd and it's journal, and, I've stumbled upon this, that states:
systemd-journald will forward all received log messages to the AF_UNIX SOCK_DGRAM socket /run/systemd/journal/syslog, if it exists, which may be used by Unix syslog daemons to process the data further.
As per manpage, I did set up my environment to also have syslog underneath, I've tweaked my code accordingly:
define('NL', "\n\r");
$log = function ()
{
if (func_num_args() >= 1)
{
$message = call_user_func_array('sprintf', func_get_args());
echo '[' . date('r') . '] ' . $message . NL;
}
};
$syslog = '/var/run/systemd/journal/syslog';
$sock = socket_create(AF_UNIX, SOCK_DGRAM, 0);
$connection = socket_connect($sock, $syslog);
if (!$connection)
{
$log('Couldn\'t connect to ' . $syslog);
}
else
{
$log('Connected to ' . $syslog);
$readables = array($sock);
socket_set_nonblock($sock);
while (true)
{
$read = $readables;
$write = $readables;
$except = $readables;
$select = socket_select($read, $write, $except, 0);
$log('Changes: %d.', $select);
$log('-------');
$log('Read: %d.', count($read));
$log('Write: %d.', count($write));
$log('Except: %d.', count($except));
if ($select > 0)
{
if ($read)
{
foreach ($read as $readable)
{
$data = socket_read($readable, 4096, PHP_BINARY_READ);
if ($data === false)
{
$log(socket_last_error() . ': ' . socket_strerror(socket_last_error()));
}
else if (!empty($data))
{
$log($data);
}
else
{
$log('Read empty.');
}
}
}
if ($write)
{
foreach ($write as $writable)
{
$data = socket_read($writable, 4096, PHP_BINARY_READ);
if ($data === false)
{
$log(socket_last_error() . ': ' . socket_strerror(socket_last_error()));
}
else if (!empty($data))
{
$log($data);
}
else
{
$log('Write empty.');
}
}
}
}
}
}
This apparently, only sees (selects) changes on write sockets. Well, might be that something here is wrong so I attempted to read from them, no luck (nor there should be):
[Thu, 12 Sep 2013 14:45:15 +0300] Changes: 1.
[Thu, 12 Sep 2013 14:45:15 +0300] -------
[Thu, 12 Sep 2013 14:45:15 +0300] Read: 0.
[Thu, 12 Sep 2013 14:45:15 +0300] Write: 1.
[Thu, 12 Sep 2013 14:45:15 +0300] Except: 0.
[Thu, 12 Sep 2013 14:45:15 +0300] 11: Resource temporarily unavailable
Now, this drives me nuts a little. syslog documentation says this should be possible. What is wrong with the code?
Original:
I had a working prototype, by simply:
while(true)
{
exec('journalctl -r -n 1 | more', $result, $exit);
// do stuff
}
But this feels wrong, and consumes too much system resources, then I found out about journald having sockets.
I have attempted to connect and read from:
AF_UNIX, SOCK_DGRAM : /var/run/systemd/journal/socket
AF_UNIX, SOCK_STREAM : /var/run/systemd/journal/stdout
the given sockets.
With /var/run/systemd/journal/socket, socket_select sees 0 changes. With /var/run/systemd/journal/stdout I always (every loop) get 1 change, with 0 byte data.
This is my "reader":
<?php
define('NL', "\n\r");
$journal = '/var/run/systemd/journal/socket';
$jSTDOUT = '/var/run/systemd/journal/stdout';
$journal = $jSTDOUT;
$sock = socket_create(AF_UNIX, SOCK_STREAM, 0);
$connection = #socket_connect($sock, $journal);
$log = function ($message)
{
echo '[' . date('r') . '] ' . $message . NL;
};
if (!$connection)
{
$log('Couldn\'t connect to ' . $journal);
}
else
{
$log('Connected to ' . $journal);
$readables = array($sock);
while (true)
{
$read = $readables;
if (socket_select($read, $write = NULL, $except = NULL, 0) < 1)
{
continue;
}
foreach ($read as $read_socket)
{
$data = #socket_read($read_socket, 1024, PHP_BINARY_READ);
if ($data === false)
{
$log('Couldn\'t read.');
socket_shutdown($read_socket, 2);
socket_close($read_socket);
$log('Server terminated.');
break 2;
}
$data = trim($data);
if (!empty($data))
{
$log($data);
}
}
}
$log('Exiting.');
}
Having no data in read socket(s), I assume I'm doing something wrong.
Question, idea:
My goal is to read the messages and upon some of them, execute a callback.
Could anyone point me into the right direction of how to programmatically read journal messages?
The sockets under /run/systemd/journal/ won't work for this – …/socket and …/stdout are actually write-only (i.e. used for feeding data into the journal) while the …/syslog socket is not supposed to be used by anything else than a real syslogd, not to mention journald does not send any metadata over it. (In fact, the …/syslog socket doesn't even exist by default – syslogd must actually listen on it, and journald connects to it.)
The official method is to read directly from the journal files, and use inotify to watch for changes (which is the same thing journalctl --follow and even tail -f /var/log/syslog use in place of polling). In a C program, you can use the functions from libsystemd-journal, which will do the necessary parsing and even filtering for you.
In other languages, you have three choices: call the C library; parse the journal files yourself (the format is documented); or fork journalctl --follow which can be told to output JSON-formatted entries (or the more verbose journal export format). The third option actually works very well, since it only forks a single process for the entire stream; I have written a PHP wrapper for it (see below).
Recent systemd versions (v193) also come with systemd-journal-gatewayd, which is essentially a HTTP-based version of journalctl; that is, you can get a JSON or journal-export stream at http://localhost:19531/entries. (Both gatewayd and journalctl even support server-sent events for accessing the stream from HTML 5 webpages.) However, due to obvious security issues, gatewayd is disabled by default.
Attachment: PHP wrapper for journalctl --follow
<?php
/* © 2013 Mantas Mikulėnas <grawity#gmail.com>
* Released under the MIT Expat License <https://opensource.org/licenses/MIT>
*/
/* Iterator extends Traversable {
void rewind()
boolean valid()
void next()
mixed current()
scalar key()
}
calls: rewind, valid==true, current, key
next, valid==true, current, key
next, valid==false
*/
class Journal implements Iterator {
private $filter;
private $startpos;
private $proc;
private $stdout;
private $entry;
static function _join_argv($argv) {
return implode(" ",
array_map(function($a) {
return strlen($a) ? escapeshellarg($a) : "''";
}, $argv));
}
function __construct($filter=[], $cursor=null) {
$this->filter = $filter;
$this->startpos = $cursor;
}
function _close_journal() {
if ($this->stdout) {
fclose($this->stdout);
$this->stdout = null;
}
if ($this->proc) {
proc_close($this->proc);
$this->proc = null;
}
$this->entry = null;
}
function _open_journal($filter=[], $cursor=null) {
if ($this->proc)
$this->_close_journal();
$this->filter = $filter;
$this->startpos = $cursor;
$cmd = ["journalctl", "-f", "-o", "json"];
if ($cursor) {
$cmd[] = "-c";
$cmd[] = $cursor;
}
$cmd = array_merge($cmd, $filter);
$cmd = self::_join_argv($cmd);
$fdspec = [
0 => ["file", "/dev/null", "r"],
1 => ["pipe", "w"],
2 => ["file", "/dev/null", "w"],
];
$this->proc = proc_open($cmd, $fdspec, $fds);
if (!$this->proc)
return false;
$this->stdout = $fds[1];
}
function seek($cursor) {
$this->_open_journal($this->filter, $cursor);
}
function rewind() {
$this->seek($this->startpos);
}
function next() {
$line = fgets($this->stdout);
if ($line === false)
$this->entry = false;
else
$this->entry = json_decode($line);
}
function valid() {
return ($this->entry !== false);
/* null is valid, it just means next() hasn't been called yet */
}
function current() {
if (!$this->entry)
$this->next();
return $this->entry;
}
function key() {
if (!$this->entry)
$this->next();
return $this->entry->__CURSOR;
}
}
$a = new Journal();
foreach ($a as $cursor => $item) {
echo "================\n";
var_dump($cursor);
//print_r($item);
if ($item)
var_dump($item->MESSAGE);
}

How could I parse gettext .mo files in PHP4 without relying on setlocale/locales at all?

I made a couple related threads but this is the one direct question that I'm seeking the answer for. My framework will use Zend_Translate if the php version is 5, otherwise I have to mimic the functionality for 4.
It seems that pretty much every implementation of gettext relies on setlocale or locales, I know there's a LOT of inconsistency across systems which is why I don't want to rely upon it.
I've tried a couple times to get the textdomain, bindtextdomain and gettext functions to work but I've always needed to invoke setlocale.
By the way, all the .mo files will be UTF-8.
Here's some reusable code to parse MO files in PHP, based on Zend_Translate_Adapter_Gettext:
<?php
class MoParser {
private $_bigEndian = false;
private $_file = false;
private $_data = array();
private function _readMOData($bytes)
{
if ($this->_bigEndian === false) {
return unpack('V' . $bytes, fread($this->_file, 4 * $bytes));
} else {
return unpack('N' . $bytes, fread($this->_file, 4 * $bytes));
}
}
public function loadTranslationData($filename, $locale)
{
$this->_data = array();
$this->_bigEndian = false;
$this->_file = #fopen($filename, 'rb');
if (!$this->_file) throw new Exception('Error opening translation file \'' . $filename . '\'.');
if (#filesize($filename) < 10) throw new Exception('\'' . $filename . '\' is not a gettext file');
// get Endian
$input = $this->_readMOData(1);
if (strtolower(substr(dechex($input[1]), -8)) == "950412de") {
$this->_bigEndian = false;
} else if (strtolower(substr(dechex($input[1]), -8)) == "de120495") {
$this->_bigEndian = true;
} else {
throw new Exception('\'' . $filename . '\' is not a gettext file');
}
// read revision - not supported for now
$input = $this->_readMOData(1);
// number of bytes
$input = $this->_readMOData(1);
$total = $input[1];
// number of original strings
$input = $this->_readMOData(1);
$OOffset = $input[1];
// number of translation strings
$input = $this->_readMOData(1);
$TOffset = $input[1];
// fill the original table
fseek($this->_file, $OOffset);
$origtemp = $this->_readMOData(2 * $total);
fseek($this->_file, $TOffset);
$transtemp = $this->_readMOData(2 * $total);
for($count = 0; $count < $total; ++$count) {
if ($origtemp[$count * 2 + 1] != 0) {
fseek($this->_file, $origtemp[$count * 2 + 2]);
$original = #fread($this->_file, $origtemp[$count * 2 + 1]);
$original = explode("\0", $original);
} else {
$original[0] = '';
}
if ($transtemp[$count * 2 + 1] != 0) {
fseek($this->_file, $transtemp[$count * 2 + 2]);
$translate = fread($this->_file, $transtemp[$count * 2 + 1]);
$translate = explode("\0", $translate);
if ((count($original) > 1) && (count($translate) > 1)) {
$this->_data[$locale][$original[0]] = $translate;
array_shift($original);
foreach ($original as $orig) {
$this->_data[$locale][$orig] = '';
}
} else {
$this->_data[$locale][$original[0]] = $translate[0];
}
}
}
$this->_data[$locale][''] = trim($this->_data[$locale]['']);
unset($this->_data[$locale]['']);
return $this->_data;
}
}
Ok, I basically ended up writing a mo file parser based on Zend's Gettext Adapter, as far as I know gettext is pretty much reliant upon the locale, so manually parsing the .mo file would save the hassle of running into odd circumstances with locale issues with setlocale. I also plan on parsing the Zend Locale data provided in the form of xml files.

Categories