It's been an exciting couple of months since my last post. We're a fully established company with beta users and a viable beta product. We're hoping to close our first seed investment within the next few weeks!
One of the biggest decisions we must make in the coming week or two is to decide our development framework. In the world of Natural Language Processing, there are two major language frameworks around which most of these technologies are built, Python and Java (and JVM utilizing libraries). There are certainly other languages that have NLP libraries, but none are as extensive as those in Python and Java. Below lists some of the major libraries we might use in each framework:
NLTK: Natural Language Toolkit. Largest OpenSource library for NLP with ports/wrappers to many open source libraries with varying licenses
SpaCy: MIT lincensed core NLP library written mostly in Cython with a Python API. Fast, effective and easy to use
Scikit-learn: Perhaps the largest machine learning library. Many algorithms that are great go text classification
Textacy: Under development library built upon SpaCy providing higher level NLP functionality
LingPipe: Commercially-licensed large library for NLP engineering
Each of these libraries could be used on the JVM in parallel with any of the libraries listed under Java or Scala
ScalaNLP: Scala based project consisting of the libraries Breeze, Epic, and Puck. Epic is the NLP focused library.
FACTORIE: Scala based library for discrete probabalistic inference but with NLP functionality superseding MALLET. Recommended by Andrew MaCullum himself.
Duckling: Wit.ai' nearly Probablistic Context Free Grammer written in Clojure. Rules assigned a probability. Max probability wins based on accumulation of rules (kinda how it works).
Could be run as an executable jar
Smile: Scala API machine learning library. Lots of ML algorithms. Looks like it could turn into the sklearn of Scala/Java.
We're leaning towards using the Java/JVM ecosystem because of Duckling performance and my previous development history. Python is great for prototyping and is incredibly expressive, quick, and fun. However, Scala has its own productivity benefits and meshes very well with Java and JARs. We will continue to do research into both to see what affords us the best chance at success!