getting the lemma of a word using wordnet - php

How can I get the lemma for a given word using Wordnet. I couldn't seem to find in the wordnet documentation what i want. http://wordnet.princeton.edu/wordnet/man/wn.1WN.html
For example for the word "books" i want to get "book" , ashes => ash , booking => book, apples => apple .... etc.
i want to achieve this using wordnet in command line and I cant find exact options to retrieve such case.
A php solution would also be of great help because I originally intend to use the wordnet php API but it seems the current one in their website isn't working.

Morphy is a morphological processor native to WordNet. The WordNet interfaces invoke Morphy to lemmatize a word as part of the lookup process (e.g. you query "enlightened", it returns the results for both "enlightened" and, via Morphy, "enlighten").
The interfaces don't include a feature that allows a user to directly access Morphy, so using it in command line is only possible if you write your own program using one of the WordNet APIs. You can find documentation for Morphy at the WordNet site.
As near as I can tell, the PHP interface is still available, although you may need to use WordNet 2.x.

If you can use another tool try TreeTagger.

I am not sure that WordNet implements it natively. NLTK has Morphy, which precisely does what you want, but it is implemented in Python though. You can write a small Python program to take input from the command line and return the lemma.
Search for 'Morphy' in the following link:
http://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.wordnet.WordNetCorpusReader-class.html
nltk.WordNetLemmatizer() also does the job. Search for 'Lemmatization' in the following link:
http://nltk.googlecode.com/svn/trunk/doc/book/ch03.html
NLTK website : http://www.nltk.org/

The WordNetLemmatizer in the nltk library will do what you need. here is python3 code:
#!Python3 -- this is lemmatize_s.py
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
print ("This program will lemmatize your input until you ask for it to 'end'.")
while True:
sentence = input("Type one or more words (or 'end') and press enter:")
if (sentence == "end"):
break
tokens = word_tokenize(sentence)
lemmatizer = WordNetLemmatizer()
Output=[lemmatizer.lemmatize(word) for word in tokens]
print (Output);
Running this from the command line:
eyeMac2016:james$ python3 lemmatize_s.py
This program will lemmatize your input until you ask for it to 'end'.
Type one or more words (or 'end') and press enter:books ashes
['book', 'ash']
Type one or more words (or 'end') and press enter:end
eyeMac2016:james$

Related

Python PHP equivalent

I have been using PHP for a while now with my Apache2 web server on my raspberry pi. It works great, but I get tired of always having to think "how do I X in PHP" or "what was the function name for this in PHP".
I am under the strong impression that there should be something equivalent in which I can replace the <?php ?> code with python code, but my search results have been confusing at best.
I am essentially looking for something where I can write whatever python code I want in an HTML script and have it interpreted and executed and its output inserted into the page when it is requested.
For example, to make a table of users from a list in python.
<table><tr><td>User list</td></tr>
<?python
import json
library=json.load(open(some_json_file,'r'));
for user in library:
print "<tr><td>"+user+"</td></tr>"
?>
</table>
I'm under the impression that chameleon can do this with its code blocks as described here,(https://chameleon.readthedocs.io/en/latest/reference.html) but as I look deeper, I get the impression it doesn't work like I am thinking it should. This is the impression I have gotten from all of the template engines I have looked at, as well as WSGI
Are there good drop in python alternatives for PHP? Or are there ways to cleanly wrap semi complex python code into my php in way that doesn't involve writing an additional python script that is called by PHP? I've tried exec() with python -c; but this was less than ideal having to escape all the ' and " characters...
update
The below code works just fine, but can become very slow if run multiple times in a script (takes about 0.4 seconds each time on a raspberry pi3). I have written a program in python that runs in the background and handles requests from php, and runs about 15x faster. I'm now maintaining it here on github.
Original Answer
After messing around I was able to come up with something mostly suitable for what I am trying to do. Inside my php I create a function that executes python scripts.
<?php
function py($s){
exec("python -c '$s'",$arr);
foreach($arr as $v){
echo $v."\n";}
}
?>
Then I use php Heredoc(equivalent to python """ , means I don't have to escape every single double quote) to fill the function:
<?php
py(<<<python
print "Hello world<br>"
s="ello world"
for x in s:
print x+"<br>"
python
);
?>
outputs >>>
Hello world
e
l
l
o
w
o
r
l
d
the only real downside I am experiencing at this point is that this method precludes me from using single quotes anywhere in my python script... :(. I'll get over it.
EDIT
I added a few more tweaks to make this even more useful. The new function is below:
<?php
function py($s,$return=false){
$s=str_replace("'","'\''",$s);
$h=<<<head
def cleanup():
for x in globals().keys():
if not x.startswith("_"):
del globals()[x]
import dill
try:
dill.load_session("pyworking.pkl")
except:
pass
head;
$f=<<<foot
import dill
dill.dump_session("pyworking.pkl")
foot;
if ($return==false){
echo shell_exec("python -c '$h$s$f'");
}
else {
return shell_exec("python -c '$h$s$f'");
}
}
?>
this allows you to use single quotes in the script and invoke the py() function multiple times in the same script and your variables and modules will follow you. At the end of the script you just call the clean up (or using php clear the pyworking.pkl file) and wipe the environment clean.
I also put this function in a file and in my pyp.ini I used the auto_prepend_file=my/file/location to automatically include it, so no need to load it before hand.
Overall I am very happy with this method, especially since I can read php variables inside my python script. Passing objects is as simple as:
<?php
$data_en=json_encode($data);
py(<<<p
import json
data=$data_en
#do something with data
p
);
?>
this would be perfect if I could think of a way to assign values to php variables inside the script, but its not a bad workaround if you want a fusion of php and python or just a way to do everything in python without writing a python webserver (which i have also done).

NLTK PunktSequenceTokenizer return type or a way to use it faster in an iterative function?

Within my PHP function, I am calling a Python script like this:
$foo = exec("python tokenize.py $bar");
The problem is, now I have built a function that executes the command above iteratively and it takes more than five minutes to finish, because of the code I use below:
train_text = state_union.raw("1963-Johnson.txt")
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
The operation of training my PST takes some time even for one of the shortest corpora in the state_union package.
I tried to store the output in a plain txt file but I cannot find the return type in the documentation here. I guess it is an iterator like everything else in the package, but I've tried to convert the iterator to the list and failed miserably.
The questions are:
1. What is the return type of the PunktSentenceTokenizer and can I store it?
2. Will reading it from the .txt file or any other source be faster than training it over and over when executing my PHP program?
3. Do you have any other idea how to use PST so it remains trained over the same portion of text so I can use it with my script faster?
Why not pickle it?
import pickle
... # other imports and stuff
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
pickle.dump(custom_sent_tokenizer, open( "save.p", "wb" ))
Now you can easily load the trained tokenizer in another call or script:
>>> import pickle
>>> pickle.load(open( "save.p", "rb" ) )
<nltk.tokenize.punkt.PunktSentenceTokenizer object at 0x00000000023B9EB8>

Is there a limit on the length of command passed to exec in PHP?

Currently I need to merge that 50+ PDF files into 1 PDF. I am using PDFTK. Using the guide from: http://www.johnboy.com/blog/merge-multiple-pdf-files-with-php
But it is not working. I have verified the following:
I have tried the command to merge 2 pdfs from my PHP and it is working.
I have echo the final command and copied that command and paste into command prompt and run manually and all the 50 PDFs are successfully merged.
Thus exec in my PHP and the command to merge 50 PDFs are both correct but it is not working when done together in PHP. I have also stated set_time_limit(0) to prevent any timeout but still not working.
Any idea what's wrong?
You can try to find out yourself:
print exec(str_repeat(' ', 5000) . 'whoami');
I think it's 8192, at least on my system, because it fails with strings larger than 10K, but it still works with strings shorter than 7K
I am not sure if there is a length restriction on how long a single command can be but I am pretty sure you can split it accross multiple lines with "\" just to check if thats the problem. Again I dont think it is... Is there any error output when you try to run the full command with PHP and exec, also try system() instead of exec().
PDFTK versions prior to 1.45 are limited to merge 26 files cuz use "handles"
/* Collate scanned pages sample */
pdftk A=even.pdf B=odd.pdf shuffle A B output collated.pdf
as you can see "A" and "B" are "handles", but should be a single upper-case letter, so only A-Z can be used, if u reach that limit, maybe you script outputs an error like
Error: Handle can only be a single, upper-case letter
but in 1.45 this limitation was removed, changelog extract
You can now use multi-character input handles. Prior versions were
limited to a single character, imposing an arbitrary limitation on
the number of input PDFs when using handles. Handles still must be all
upper-case ASCII.
maybe you only need update your lib ;)

General utility to remove/strip all comments from source code in various languages?

I am looking for a command-line tool that removes all comments from an input
file and returns the stripped output. It'd be nice it supports popular
programming languages like c, c++, python, php, javascript, html, css, etc. It
has to be syntax-aware as opposed to regexp-based, since the latter will catch
the pattern in source code strings as well. Is there any such tool?
I am fully aware that comments are useful information and often leaving them
as they are is a good idea. It's just that my focus is on different use cases.
cloc, a free Perl script, can do this.
Remove Comments from Source Code
How can you tell if cloc correctly identifies comments? One way to convince yourself cloc is doing the right thing is to use its --strip-comments option to remove comments and blank lines from files, then compare the stripped-down files to originals.
It supports a lot of languages.
What you want can be done with emacs scripting.
I wrote this script for you which does exactly what you want and can be easily extended to any language.
Filename: kill-comments
#!/usr/bin/python
import subprocess
import sys
import os
target_file = sys.argv[1]
command = "emacs -batch -l ~/.emacs-batch " + \
target_file + \
" --eval '(kill-comment (count-lines (point-min) (point-max)))'" + \
" -f save-buffer"
#to load a custom .emacs script (for more syntax support),
#use -l <file> in the above command
#print command
fnull = open(os.devnull, 'w')
subprocess.call(command, shell = True, stdout = fnull, stderr = fnull)
fnull.close()
to use it just call:
kill-comments <file-name>
To add any language to it edit ~/.emacs-batch and add that language's major mode.
You can find syntax aware modes for basically everything you could want at http://www.emacswiki.org.
As an example, here is my ~/.emacs-batch file. It extends the above script to remove comments from javascript files. (I have javascript.el in my ~/.el directory)
(setq load-path (append (list (concat (getenv "HOME") "/.el")) load-path))
(load "javascript")
(setq auto-mode-alist (cons '("\\.js$" . javascript-mode) auto-mode-alist))
With the javascript addition this will remove comments from all the filetypes you mentioned as well as many more.
Good Luck and happy coding!
Paul Dixon's response to this question on stripping comments from a script might be worth looking at.
I don't know of such a tool - which isn't the same as saying there isn't one.
I once started to design one, but it quickly gets insane - not helped by the comment rules in C and C++.
/\
* Comment? *\
/
(Answer: yes!)
"/\
* Comment? *\
/"
(Answer: no!)
To do the job reasonably, you have to be aware of:
Language comment conventions
Language quoted string conventions (Python and Perl are enough to drive you insane here)
Escape conventions (Shell gets you here - along with the quotes)
These combine to make the job tolerably close to impossible.
I ended up with a program, scc, to strip C and C++ comments. Its torture test includes worse examples than the comments shown above - and it does a decent job. But extending that to do shell or Perl or Python or (take your pick) was sufficiently non-trivial that I did not do it.
No such tool exists yet.
You might coax GNU Source-highlight into doing this.

Options for PHP CLI on windows

I'm working with the PHP CLI on windows at the moment to write some small desktop command-line apps.
I wanted to know if and how it may be possible to:
Clear the screen (cls would be the normal command but exec() won't work with it)
Change the color, change the color of parts of the output (seen this in programs before)
Make the command line horizontally bigger - things quickly get unreadable
Is any of the above possible from inside a PHP script?
On Windows, in the standard CLI prompt, you cannot output in colour (as in the answer by Spudley).
You can change the size of the window as a user by right-clicking the command window's title bar and selecting Properties, then ammending values in the Layout tab. I do not think it is possible to ammend the width of the CLI within PHP.
You can check the width of the CLI window on Windows using the function I wrote here
See the PHP manual page for working with the commandline
To directly answer each of your bullet points:
There is a comment on that page which gives a function that can clear the screen. I'll quote it here for you:
<?php
function clearscreen($out = TRUE) {
$clearscreen = chr(27)."[H".chr(27)."[2J";
if ($out) print $clearscreen;
else return $clearscreen;
}
?>
There's also another comment which explains how to change the colours. Again, I'll quote it:
<?php
echo "\033[31m".$myvar; // red foreground
echo "\033[41m".$myvar; // red background
?>
and to reset:
<?php
echo "\033[0m";
?>
You should read through the rest of that page for a whole load more suggestions on how to manipulate the CLI.
The only part of your question that leave unanswered is the third bullet point. Sadly, I don't believe you'll be able to do this, and I don't think it's possible to horizontally resize the Windows command line window.
Hope that helps.
I created a small backup script with PHP, and from what I can remember, you can print backspace characters to remove content. Not really ideal though.
Just google'd it: http://www.php.net/manual/en/features.commandline.php#77142
As far as the third question goes, I suggest you witch the default command line to Console 2. It is a great replacement that not only lets you use any width you like (as long as it fits your screen), but also supports command history, tabs, and some UI sugar.
The provided code will not work under Windows because PHP under windows does something to the command window. I am not sure what PHP does but I wrote a simple Freebasic program with only two lines:
cls
end
I then compiled it and ran it under a regular command line window. It cleared the screen without any kind of a problem. I then did the following in PHP:
<?php
echo "This is a test\n";
system( "cls.exe" );
exec( "cls.exe" );
passthru( "cls.exe" );
?>
When I ran the program it did nothing more than just the "This is a test" line. Thus, there is some kind of suppression going on with PHP that looks for and stops any kind of escape sequence from occurring. Until this is fixed in PHP - you will never be able to do a cls, nor use curses, ncurses, or any other library. What has to be done is to integrate something like FreeBasic's windowing methods as some kind of a class (or maybe just a C set of routines) that will open a new window via THAT language's methodologies and use them to do the text window. Then all of the escape sequences will work. Until then - they won't.
What I find weird about this is that PHP was originally written in Perl and Perl will do ncurses on Windows without any problems. Perl will also allow all escape sequences to work. So there is just something being done on the Windows compile that is causing this problem.

Categories