How to parse dict output in a user friendly way in PHP? - php

I am trying to implement a dictionary-type service.
I send a request with php using cURL to dict.org with the dict protocol.
This is my code (which on its own works and may be helpful for future readers):
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "dict://dict.org/define:(hello):english:exact");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$definition = curl_exec($ch);
curl_close($ch);
echo $definition;
The server returns the definition, as expected, along with several headers (that I do not need). The response looks something like this:
220 miranda.org dictd 1.9.15/rf on Linux 2.6.26-2-686 <auth.mime> <29631663.31530.1250750274#miranda.org>
250 ok
150 3 definitions retrieved
151 "Hello" gcide "The Collaborative International Dictionary of English v.0.48"
Hello \Hel*lo"\, interj. & n.
An exclamation used as a greeting, to call attention, as an
exclamation of surprise, or to encourage one. This variant of
{Halloo} and {Holloo} has become the dominant form. In the
United States, it is the most common greeting used in
answering a telephone.
[1913 Webster +PJC]
(... some content removed)
.
250 ok [d/m/c = 3/0/162; 0.000r 0.000u 0.000s]
221 bye [d/m/c = 0/0/0; 0.000r 0.000u 0.000s]
I was wondering if:
a) Is there a way to specify to curl (or an option in the dict protocol) to not return all that extra information (i.e. 250 ok [d/m/c = 3/0/162; 0.000r...])
b) You probably noticed that the dict response returns information that is not displayed in the most user friendly way. I was wondering if anybody knew of any existing php library that will allow me to display this in a nicer way. Otherwise I'd have to code my own.
c) If this is not the way most dictionary websites retrieve their definitions, how do they do it? In my understanding the most comprehensive dictionary database is the one at dict.org (which supports the dict protocol and is where I am sending my cURL request).
Thank you!

Before I start let me state that I don't know the specific of the dict protocol.
I doubt that you'll be able to create a request that only delivers the text. The information you wish to discard looks like status information and is therefore useful.
The way I'd handle this is as follows:
Read the curl response data into an array so that each line is an separate entry in the array. You could use explode() and split at the new line character (\n) to do this.
Iterate the array, EG for ($response as $responseLine) {}
perform a regex (or some other form of pattern matching) on $responseLine to find the definition. It looks like the actual text is the only $responseLine which doesn't start with a number.
You may want to check what characterset the dict protocol uses. I haven't mentioned any error handling, but that should be straight forward.

Related

Get all contents of stream socket with fgets() in blocking mode

In order to complete the handshaking for Websockets in ssl, the socket must be read in blocking mode. Using stream sockets, communication is done from the php backend with the (javascript) client using fwrite() and fgets(). In blocking mode, fgets() will wait until the next line comes in, and grab one line. Once the socket connection is made, the client sends the PHP some headers so that the handshake can be completed. The problem is, I can't think of a way to find where the end of the headers are, since the order depends on the browser being used.
I used this work around for chrome (since the sec-websocket-extensions line is the last header sent)
stream_set_blocking($lsSocketNew, true);
$lcHeader = "";
while($lcLine = fgets($lsSocketNew)){
$lcHeader .= $lcLine;
if(strstr($lcLine, "Sec-WebSocket-Extensions")){
break;
}
}
but this doesn't work in other browsers like firefox, where this header is the first one sent. :P
(I think fread() is supposed to do what I am looking for -- in blocking mode it is supposed to get "everything" on the socket when it comes in... but when I tried fread instead, it was returning a blank string. :P stream_get_contents() was the same )
Although I can't give you a PHP advice, there is a couple of things that you may want to consider:
I. What kind of "everything" are you looking for? There are no message borders in TCP so "everything in the stream" is equivalent to "random ordered amount of data". Unfortunately, you aren't going to magically read all HTTP headers and stop there.
II. Given point I, you have to find something that separates HTTP headers from an HTTP body. This is actually rather simple, because the headers end with a blank line. So, just read the data until you receive CRLF CRLF*. In PHP you will most probably see CRLF as \n, though this can depend on the OS.
III. If you're implementing websockets, using fgets is questionable, because the rest of the protocol (after HTTP handshake) is binary. You may want to use dedicated PHP's sockets module and socket_recv instead of fread. I can't say how these two functions differ, but socket_* functions are just a wrapper around BSD sockets which are implemented in a wide variety of languages. Since they're mostly language agnostic, you will find more support and tutorials in the internet.
* Per the HTTP standard:
CR = <US-ASCII CR, carriage return (13)>
LF = <US-ASCII LF, linefeed (10)>

length issues with solr select POST with cURL in PHP

I have a solr query that has been working perfectly:
$ch = curl_init();
$ch_searchURL = "$base_url/$collection/select?q=$s&wt=json&indent=true";
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL, $ch_searchURL);
$rawData = curl_exec($ch);
$json = json_decode($rawData,true);
Initially, my $s variable was literally one thing: e.g. ?q=name:brian, but my user base wanted the ability to search multiple things at once, so I started to build that in:
?q=name:("brian"+OR+"mike"+OR+"james"+OR+"emma"+OR+"luke")
It then got to the point where they wanted to search 5,000 things at once, which caused this method of building out the solr GET query to fail as the literal URL length was longer than the max allowed length of ~2,000, so I thought using a POST might work, which I accomplished by adding the following lines:
$ch_searchURL = "$base_url/$collection/select";
$multiline_q = "q=$s&wt=json&indent=true";
curl_setopt($ch, CURLOPT_POSTFIELDS, $multline_q);
This seemed to allow me to search for around 500 items at a time - (which would still, in GET world, cause a URL length of around 4,000) - so better than the GET method, but once I go past that number of items, the solr query fails again.
Because I'm POSTing (maybe?), I don't get any error response from solr, so I don't know what's causing the query to fail, and I can't manually test the query in the browser because it's ~40,000 characters long and won't paste. If I do var_dump($rawData);, I see this:
string(238) " 05 " // or 04, or 08
I've used solr quite a bit with PHP & cURL, but always with the GET method. This is my first foray into using POST. Am I doing something wrong here? Am I just exceeding the actual amount of q options that I can ask solr to retrieve for me, regardless of the method?
Any light that anyone could shed on this would be helpful...
There is no limit on the Solr side - we regularly use Solr in a similar way.
You need to look at the settings for your servlet container (Tomcat, Jetty etc.) and increase the maximum POST size. Look up maxPostSize if you are using Tomcat and maxFormContentSize if you are using Jetty.
source : link

Sending compressed text over Amazon SQS from PHP to NodeJS

I seem to be stuck at sending the compressed messages from PHP to NodeJS over Amazon SQS.
Over on the PHP side I have:
$SQS->sendMessage(Array(
'QueueUrl' => $queueUrl,
'MessageBody' => 'article',
'MessageAttributes' => Array(
'json' => Array(
'BinaryValue' => bzcompress(json_encode(Array('type'=>'article','data'=>$vijest))),
'DataType' => 'Binary'
)
)
));
NOTE 1: I also tried putting compressed data directly in the message, but the library gave me an error with some invalid byte data
On the Node side, I have:
body = decodeBzip(message.MessageAttributes.json.BinaryValue);
Where message is from sqs.receiveMessage() call and that part works since it worked for raw (uncompressed messages)
What I am getting is TypeError: improper format
I also tried using:
PHP - NODE
gzcompress() - zlib.inflateraw()
gzdeflate() - zlib.inflate()
gzencode() - zlib.gunzip()
And each of those pairs gave me their version of the same error (essentially, input data is wrong)
Given all that I started to suspect that an error is somewhere in message transmission
What am I doing wrong?
EDIT 1: It seems that the error is somewhere in transmission, since bin2hex() in php and .toString('hex') in Node return totally different values. It seems that Amazon SQS API in PHP transfers BinaryAttribute using base64 but Node fails to decode it. I managed to partially decode it by turning off automatic conversion in amazon aws config file and then manually decoding base64 in node but it still was not able to decode it.
EDIT 2: I managed to accomplish the same thing by using base64_encode() on the php side, and sending the base64 as a messageBody (not using MessageAttributes). On the node side I used new Buffer(messageBody,'base64') and then decodeBzip on that. It all works but I would still like to know why MessageAttribute is not working as it should. Current base64 adds overhead and I like to use the services as they are intended, not by work arounds.
This is what all the SQS libraries do under the hood. You can get the php source code of the SQS library and see for yourself. Binary data will always be base64 encoded (when using MessageAttributes or not, does not matter) as a way to satisfy the API requirement of having form-url-encoded messages.
I do not know how long the data in your $vijest is, but I am willing to bet that after zipping and then base64 encoding it will be bigger than before.
So my answer to you would be two parts (plus a third if you are really stubborn):
When looking at the underlying raw API it is absolutely clear that not using MessageAttributes does NOT add additional overhead from base64. Instead, using MessageAttributes adds some slight additional overhead because of the structure of the data enforced by the SQS php library. So not using MessageAttributes is clearly NOT a workaround and you should do it if you want to zip the data yourself and you got it to work that way.
Because of the nature of a http POST request it is a very bad idea to compress your data inside your application. Base64 overhead will likely nullify the compression advantage and you are probably better off sending plain text.
If you absolutely do not believe me or the API spec or the HTTP spec and want to proceed, then I would advise to send a simple short string 'teststring' in the BinaryValue parameter and compare what you sent with what you got. That will make it very easy to understand the transformations the SQS library is doing on the BinaryValue parameter.
gzcompress() would be decoded by zlib.Inflate(). gzdeflate() would be decoded by zlib.InflateRaw(). gzencode() would be decoded by zlib.Gunzip(). So out of the three you listed, two are wrong, but one should work.

PostgreSQL Base64 Image decode issue

I am having an issue converting an image stored as base64 in a PostgreSQL database into an image to display on a website. The data type is bytea and I need to get the data via cURL.
I am working with an API to connect to a client's stock system which returns XML data.
I know storing images this way in a DB is not a great idea but that's how the client's system works and it can't be changed as it is a part of an enterprise solution provided by a 3rd Party.
I'm using the following to query the DB for the PICTURE field from the PICTURE table where the PART = 01000015
$ch = curl_init();
$server = 'xxxxxx';
$select = 'PICTURE';
$from = 'picture';
$where = 'part';
$answer = '01000015';
$myquery = "SELECT+".$select."+FROM+".$from.'+WHERE+'.$where."+=+'".$answer."'";
//Define curl options in an array
$options = array(CURLOPT_URL => "http://xx.xxx.xx.xx/GetSql?datasource=$server&query=$myquery+limit+1",
CURLOPT_PORT => "82",
CURLOPT_HEADER => "Content-Type:application/xml",
CURLOPT_RETURNTRANSFER => TRUE
);
//Set options against curl object
curl_setopt_array($ch, $options);
//Assign execution of curl object to a variable
$data = curl_exec($ch);
//Close curl object
curl_close($ch);
//Pass results to the SimpleXMLElement function
$xml = new SimpleXMLElement($data);
//Return String
echo $xml->row->picture;
The response I get from this is: System.Byte[]
Thus if I use base64_decode() in PHP I am obviously just decoding the string "System.Byte[]".
I am guessing that I need to use the DECODE() function in PostgreSQL to convert the data in the query? However, I've tried loads of combinations but I'm stuck. I've had a few downvotes for questions and I'm not too sure why so if this is a bad question I'm sorry, I just really need some help with this one.
(nb:I've replaced the IP and $server with xxxxx for security)
To explain further:
The client has a POS system which is based on ASP.NET and saves the data as XML files on the remote server. I have access to this data via an API which includes a SQL query function using HTTP/cURL defined as follows:
http://remoteserver:82/pos.asmx.GetSql?datasource=DATASOURCE&query=MYQUERY
So to get the field that contains the picture data I am currently usingthe above code.
The query is in the CURL URL i.e. http://remoteserver:82/pos.asmx.GetSql?datasource=12345&query=SELECT+*+FROM+picture+WHERE+part+=+'01000015'";
However, this returns System.Byte[] instead of encoded data which I can then decode in PHP.
Additional info:
PostgreSQL version: PostgreSQL 9.1.3 on i686-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51), 32-bit
Table Schema:
Available here: http://i.stack.imgur.com/sc8Gw.png
You should preferably have the server storing the data in PostgreSQL as a bytea field, then encoding to base64 to send to the client, but it sounds like you don't control the server.
The string System.Byte[] suggests it's an app using .NET, like ASP.NET or similar, and it's not correctly handling a bytea array. Instead of formatting it as base64 for output it's embedding the type name in the output.
You can't fix that on the client side, because the server is sending the wrong data.
You'll need to show the server-side tables and queries.
Update after query amended:
You're storing a bytea and returning it directly. The client doesn't seem to understand byte arrays and tries to output it naïvely, probably something like casting it to a string. Since the documentation says it expects "base64" you should probably provide that, instead of a byte array.
PostgreSQL has a handy function to base64-encode bytea data: encode.
Try:
SELECT
account, company, date_amended,
depot, keyfield, part,
encode(picture, 'base64') AS picture,
picture_size, source
FROM picture
WHERE part = '01000015'
The formating isn't significant, it just makes it easier to read here

PHP Post Body Encoding Problems

I'm trying to mimick an application that sends octet streams to and from a server. The data contained in the body looks like raw bytes, and I'm fairly certain the data being sent for each command is static, so I'm hoping to map the bytes to something more readable in my application. For example, I'll have an array that does: "test" => "&^D^^&#*#dgkel" So I can call "test" and get the real bytes that need to be sent. Trouble is, PHP seems to convert these bytes. I'm not sure if it is an encoding problem or what, but what has been happing is I'll give it some bytes (for example, �ھ����#�qs��������������������X����������������������������) which has a length of 67 I believe, but PHP will say (when I do a var_dump of the HTTP request) that the headers sent contained "Content-Length: 174" or something close to that and the bytes will look like �ھ����#�qs��������������������X����������������������������
So I'm not really sure how to fix this.. Anyone have any ideas? Cheers!
Edit, a little PHP:
$request = new HttpRequest($this->GetMessageURL(), HTTP_METH_POST);
$request->addHeaders($headers);
$request->addRawPostData($buttonMapping[$button]);
$request->send();

Categories