Text mining with PHP [closed] - php

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm doing a project for a college class I'm taking.
I'm using PHP to build a simple web app that classify tweets as "positive" (or happy) and "negative" (or sad) based on a set of dictionaries. The algorithm I'm thinking of right now is Naive Bayes classifier or decision tree.
However, I can't find any PHP library that helps me do some serious language processing. Python has NLTK (http://www.nltk.org). Is there anything like that for PHP?
I'm planning to use WEKA as the back end of the web app (by calling Weka in command line from within PHP), but it doesn't seem that efficient.
Do you have any idea what I should use for this project? Or should I just switch to Python?
Thanks

If you're going to be using a Naive Bayes classifier, you don't really need a whole ton of NL processing. All you'll need is an algorithm to stem the words in the tweets and if you want, remove stop words.
Stemming algorithms abound and aren't difficult to code. Removing stop words is just a matter of searching a hash map or something similar. I don't see a justification to switch your development platform to accomodate the NLTK, although it is a very nice tool.

I did a very similar project a while ago - only classifying RSS news items instead of twitter - also using PHP for the front-end and WEKA for the back-end. I used PHP/Java Bridge which was relatively simple to use - a couple of lines added to your Java (WEKA) code and it allows your PHP to call its methods. Here's an example of the PHP-side code from their website:
<?php
require_once("http://localhost:8087/JavaBridge/java/Java.inc");
$world = new java("HelloWorld");
echo $world->hello(array("from PHP"));
?>
Then (as someone has already mentioned), you just need to filter out the stop words. Keeping a txt file for this is pretty handy for adding new words (they tend to pile up when you start filtering out irrelevant words and account for typos).
The naive-bayes model has strong independent-feature assumptions, i.e. it doesn't account for words that are commonly paired (such as an idiom or phrase) - just taking each word as an independent occurrence. However, it can outperform some of the more complex methods (such as word-stemming, IIRC) and should be perfect for a college class without making it needlessly complex.

You can also use the uClassify API to do something similar to Naive Bayes. You basically train a classifier as you would with any algorithm (except here you're doing it via the web interface or by sending xml documents to the API). Then whenever you get a new tweet (or batch of tweets), you call the API to have it classify them. It's fast and you don't have to worry about tuning it. Of course, that means you lose the flexibility you get by controlling the classifier yourself, but that also means less work for you if that in itself is not the goal of the class project.

Try open calais - http://viewer.opencalais.com/ . It has api, PHP classes and many more. Also, LingPipe for this task - http://alias-i.com/lingpipe/index.html

you can check this library https://github.com/Dachande663/PHP-Classifier very straight forward

you can also use thrift or gearman to deal with nltk

Related

When can I start using a Framework (Laravel)? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I want to start by saying that I searched a lot for this exact question, but none of them satisfied my needs.
I program Php, MySQL, HTML, CSS and Javascript the 'old way', using just a text editor and building every website from scratch. I built websites from the most simple to an-almost e-commerce just by coding every piece of the application. The most advance thing I did was using some simple classes, like a database wrapper, singleton, and for the rest I always used functions.
Now, recently I signed up for a website where there are courses ( I won't say its name because I don't know if I'm allowed ) and I followed one about Laravel 3 ( I know currently its version is 4.x ), and I must admit I fell in love with it. I like it very much and I want to start using it but I'm afraid that doing so will 'dumb' me.
What I mean is that Laravel has a lot of helper functions, Eloquent structure and so on, so by using it I won't learn any more the pure Php because for everything you need there is already a built helper function.
To make a very simple example, if you want to join some tables you use Eloquent and within literally 3 second you accomplish this. If you want to log a user in, again you have an Auth class that does everything for you, even setting sessions.
This is my biggest fear, that I won't learn anything anymore because all you need is already provided, you don't have to think that much anymore.
On the other side, Laravel helps you a lot and it eases your work very much. As much as I want to start digging into it more I can't help but fear its downsides.
So, do you think I should wait and learn more traditional Php before dive into a Framework?
When is the right time to start using one?
Look at all the sites you built. Identify redundant elements. Extract them into classes and functions and build your own framework. This will allow you to build sites faster and build a library. Once you do that, there's no dumbing down. You can choose to use another or not... but you'll have yours too.
That's what I did. I have my own framework. And it ain't bad!
There are two types of developers:
users - they can use stuff and get by
actual developers - they can build stuff from scratch and give users tools
Choose which model fits your needs best.
1st category goes for quick results, are efficient and get the job done. These guys should use 3rd party frameworks and libraries.
2nd category are artists pushing themselves further with each new piece of code they build. They go for performance over turnaround time, code beauty and functionality vs. just functionality, etc... These guys feel offended by 3rd party frameworks and libraries and always roll their own. Because they can!
There's another catch. Some frameworks might have too much fat for your needs. Building more specialized solutions might actually yield way better performance than a one-size-fits-all framework. That's another perspective.
Bafta mai departe :)

generate PHP classes from XSD? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 10 months ago.
Improve this question
Is there in the world analogues of JavaBeans or JAXB for PHP?
Is it possible to generate PHP classes from XML schema?
It's common practice to publish API's as XSD schemas. Java and C# guys can get advantage of this by generating classes right from XSD. Is there same tool for PHP?
I'm working now on this issue and going to release the tool as soon as it reaches more-less stable state. Check here https://web.archive.org/web/20111026063725/http://mikebevz.com/xsd-to-php-tool/
Upd. I've just release first working prototype, it works fine with UBL 2.0 schemas and one simple schema, but more serious testing is on the way. I'd appreciate if you send schemas you're working with, so I'd include them in the test suite.
Upd. 2. XSD2PHP reached version 0.0.5. Check the progress on https://github.com/moyarada/XSD-to-PHP
The main reasons for using XSD class generators is to
Get compile time checking
An easier syntax than plain old XML API's
Auto completion in your IDE.
Now contrast this with PHP. PHP does not have compile time checking and it has support for dynamic methods/properties. This voids two of the main reasons above and makes this a non-issue unless you really need auto completion. In other words, there is reason to use an XSD class generator in PHP, and that is probably also why none exist.
My suggestion is to use PHPs Simple XML which creates properties to match the XML dynamically during runtime. If you validate your XML against the XSD file and then create a Simple XML object, you have your XML object structure complete with methods and properties, without having to generate code. A perfectly good approach in PHP.
Note that I don't state that SimpleXML is the same as generated XSD classes, of course not.. But it is pretty close, usage and API-wise. You still end up doing something like $company->employee[2]->firstname either way.
This seems to do a decent job https://github.com/goetas-webservices/xsd2php
I wish it handled enumeration validation, but seemed to work ok in my use case. I found the META .yml files it generates helpful.
XSD schemas are usually written in WSDL files on SOAP Web Services.
wsdl2php is a tool for parsing WSDL(XSD) schemas to php classes. It uses php's native SoapClient as it's client:
https://github.com/jbarciauskas/wsdl2php
This library seems to be the best choice nowadays:
https://github.com/goetas/xsd2php
It generates PHP classes for XML Elements and can convert it back and forth:
XML -> PHP -> XML
I looked into that a while ago, and I certainly could not find one. If your schema is simple, there's a guy who hacked a simple version together for flat schemas.
That's all I know about. Normally these guys are good at supporting languages other than the main ones, but they don't do PHP either.
The DMS Software Reengineering Toolkit is configurable code generation machinery, that can be used to process arbitrary formal documents as input. DMS can be used to generate
code in arbitary output languages.
We have used it to generate native Java and COBOL XML readers and writers from DTDs, which are the elder cousin of schemas. The same ideas would be easily applied to PHP.
There is another recent tool called PiBX a JiBX inspired tool.
From the site:
PiBX is an XML-Data-Binding framework for PHP.
With PiBX you can generate PHP classes based off an available
XML-Schema. These classes can be used to marshal the informations to
XML without hassling with schema checks, constraints or restrictions.

Minimalistic visitor stats based on PHP? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Does anybody know a minimalistic, nice-looking visitor statistics suite based on PHP that displays visitor stats in a end-user friendly way.
I know Google Analytics, and the big names in PHP and Perl based traffic analysis; they are all too complicated and feature-rich for what I need. I am looking for something that is already totally simplified and that I don't have to strip down.
Required features:
Visitors today, this week, this month
Where visitors came from
A good referer overview
Visitors on this page
Good filtering of bots
Optional:
Can connect to built-in IP locator thingy, I forget the name... Anyway, I have good IP to country resolution based on one of the big providers' functions in the provider's $_SERVER variable
A nice API and/or source code documentation to extend / interact with would be a plus.
There is no access to the server logs on the server I want to use this so the suite would have to bring its own tracking facilities, be that a PHP include, an image or a script.
Open Source would be nice, but I would consider paid solutions as well, as long as they're scripts shipped with source. I want no dependencies from external services.
Thanks in advance!
I liked the look of Piwiki, however it may be a bit feature heavy for you. I is aimed at being a foss alternative to Google Analytics
In most cases, you'd roll your own. Take a look at PHP's $_REQUEST documentation and write this data into a SQL table. You can do this portion very easily in 20 lines or less, and if you're clever - probably 5 or so.
Now, displaying that data can be done in any method you so choose, since you've got all the data in an SQL table. Sort, filter, and organize using any method you please.
Perhaps one of the most well-known PHP-based analytics applications out there is Mint (http://haveamint.com/). It's not as feature rich as other analytics apps ... it may be too feature rich for what you are looking for.
Google Analytics is by far the most used of all statistic software and is the most reliable.
You get a global map of where in the world they're coming from, what specific pages they come from, duration on the site.
I just started using Clicky, which I am very happy with. Was using Google Analytics before, but this is a lot cleaner and clearer.
If you want to display stats to visitors you can either allow public access through the preferences, or you can use some of the widgets they provide. I would probably go with the latter.
Another cool thing is that you can actually watch real-time statistics. For example, they have a map where dots pop up when someone enters your site. Fun, fun, fun ;D
If you are beginner, you can try this easy and simple but in JS code to get all stats from http://www.eaglestats.com/
Or if you want a very simple one without stats in PHP, try this : http://www.phpsimple.net/tutorials/real_visitor_counter/
I have myself discarded Google Analytics, Piwik/Matomo and OWA - as they're all bloated overkills for my needs (but I self host many sites). I am preparing to write my own analytics, because Mint cannot be downloaded anymore.
If anyone knows of someone else who has started an interesting open source project focusing on minimalistic analytics, I'd sure like to know about it, as google search doesn't give many options.

Trying to find a PHP5 API-based embeddable CMS [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I've been making the rounds for a CMS that I can use as an API, in a sort of "embedded" mode. I mean by this that I don't want the CMS to do any logic or presentation. I want it to be used as an API, which I can then use within an existing site. I don't want to be tied to the architecture of the CMS.
A good example of this is NC-CMS (http://www.nconsulting.ca/nc-cms/). All it needs is an include at the top, then wherever editable content is desired it's only a function call with a unique label. It's also perfect in the sense that it allows to differentiate between small strings (like titles, labels) and texts (which require a rich-text editor).
It's the only CMS I found that fits this description, but it is a little too light as it does not handle site structure. I need to be able to allow my client to add pages, choosing an existing template for the layout. A minimal back-end is required.
Wordpress also fits some requirements in that it handles only content editing and allows freedom for the themes by letting them call the content where and how they want it. But it is article-based and backwards, in that it embeds sites (as themes) within its structure, rather than being embeddable in sites like NC.
It's funny how checking out all the CMS out there, almost all of them claim that most CMS are not self-sufficient, that they do not handle application logic, while (almost) every single on I found with only one exception do so. Many are mostly article-based blog engines, which does not fit my need.
I would appreciate any CMS that fits the general description.
Creator of nc-cms here.
Adding on to nc-cms may be a realistic option, depending on exactly what you want to do. The entire nc-cms project is under 2,000 lines in total and the codebase is kept rather clean and simple for the very reason of per project/client expandability.
I wouldn't be all that hard to make one, honestly. Maybe as a wrapper around the nc-cms system after taking a look (possibly using and abusing ob_start/get_contents/end_clean).
I've been putting one together using PHP5 constructs and the Dwoo templating engine. Dwoo's template inheritance makes this a breeze. Right now it works by abusing the auto_prepend_file php directive to set up the template object and then just uses REQUEST_URI to process the template file (which is the actual file being requested). Then it outputs the processed template and exits. Kinda slick, but may not have that big of an audience.
I'm not exactly sure where you are placing the line between what you want this system to do and not do. Adding pages and choosing templates would seem to me to be in the realm of presentation, imo.
Would Joomla do it?
You should look into Osmek, its a developers dream. Its a centrally hosted system with no install. Osmek's API gives you access to your entire account, in just about any format, including JSON, XML, HTML, Serialized PHP, and template responses.

What are best practices for developing consistent libraries? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am working on developing a pair of libraries to work with a REST API. Because I need to be able to use the API in very different settings I'm currently planning to have a version in PHP (for web applications) and a second version in Python (for desktop applications, and long running processes). Are there any best practices to follow in the development of the libraries to help maintain my own sanity?
So, the problem with developing parallel libraries in different languages is that often times different languages will have different idioms for the same task. I know this from personal experience, having ported a library from Python to PHP. Idioms aren't just naming: for example, Python has a good deal of magic you can use with getters and setters to make object properties act magical; Python has monkeypatching; Python has named parameters.
With a port, you want to pick a "base" language, and then attempt to mimic all the idioms in the other language (not easy to do); for parallel development, not doing anything too tricky and catering to the least common denominator is preferable. Then bolt on the syntax sugar.
'Be your own client' : I've found that the technique of writing tests first is an excellent way of ensuring an API is easy to use. Writing tests first means you will be thinking like a 'consumer' of your API rather than just an implementor.
Try to write a common unit test suite for both. Maybe by wrapping a class in one language for calling it from the other. If you can't do it, at least make sure the two versions of the tests are equivalent.
Well, the obvious one would be to keep your naming consistent. Functions and classes should be named similarly (if not identically) in both implementations. This usually happens naturally whenever you implement an API separately in two different languages. The big ticket item though (at least in my book) is to follow language-specific idioms. For example, let's assume that I were implementing a REST API in two languages I'm more familiar with: Ruby and Scala. The Ruby version might have a class MyCompany::Foo which contains method bar_baz(). Conversely, the Scala version of the same API would have a class com.mycompany.rest.Foo with a method barBaz(). It's just naming conventions, but I find it goes a long way to helping your API to feel "at home" in a particular language, even when the design was created elsewhere.
Beyond that I have only one piece of advise: document, document, document. That's easily the best way to keep your sanity when dealing with a multi-implementation API spec.
AFAIKT there are a lot of bridges from to scripting languages. Let's take e.g Jruby, it's Ruby + Java, then there are things to embed Ruby in Python (or the other way). Then there are examples like Etoile where the base is Objective-C but also bridges to Python and Smalltalk, another approach on wide use: Wrapping C libraries, examples are libxml2, libcurl etc etc. Maybe this could be the base. Let's say your write all for Python but do implement a bridge to PHP. So you do not have that much parrallel development.
Or maybe it's not the worst idea to base that stuff let's say on .NET, then you suddenly have a whole bunch of languages to your disposal which in principal should be usable from every other language on the .NET platform.
why not use python for web applications too? there are several frameworks available: django, web2py - similar to django but many say it's simpler to use, there is also TurboGears, web.py, Pylons
along the lines of bridging - you could use interprocess communication to have PHP and python application (in daemon mode) talk to each other.

Categories