A curated list of books, libraries, apps and papers we love at Tryolabs. We work with blazing startups and help them build complex projects using Python, NLP & Machine Learning.
We create amazing Internet & Mobile products for blazing startups. We combine the Python ecosystem with Machine Learning and Natural Language Processing technologies to create heavy backend apps with artificial intelligence components. We follow agile methodologies in order to develop MVPs and full products the lean way.
A very useful development tool that lets us create isolated Python environments for every project, isolating the set of libraries used in the project from the system.
Package manager for iOS projects. Handle the setup and update of XCode projects to speed up the integration of new components.
CLI for iOS projects. Has various tools to perform common task from the command line (ex: generate, sign and ditribute OTA an ipa)
Vagrant is a tool for creating isolated, reproducible development environments using virtual machines. It is usually used with VirtualBox, but supports VMWare and other virtualization systems.
Docker is a tool for creating and managing software containers.
Metamon is a tool to automatically set up an isolated execution environment for Django applications.
Just use git. A good resource is the Pro Git book by Scott Chacon, and GitHub's help site.
The PEP8 is the definitive reference for Python coding style. The pep8 package can be used to scan code and find parts that don't conform to the PEP8 standard.
With Emacs, the emacs-pep8 package can be used to run the pep8.py script.
We use Ansible for all our deployment and server orchestration tasks.
Just use Postgres. It's not just a database, it's a complete "relational database framework" that provides full-text search, GIS and extensive documentation of every knob and lever.
Are you sure Postgres can't do what you want?
This list of books represents, in our opinion, a good balance between theory and practice. We don't expect everyone to read all of these, rather, they should take a few books from this common list.
- Machine Learning: The Art and Science of Algorithms that Make Sense of Data
- Learning scikit-learn: Machine Learning in Python
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction
- Principles of Data Mining
- Foundations of Statistical Natural Language Processing
- Bayesian Reasoning and Machine Learning
- Gaussian Processes for Machine Learning
- Information Theory, Inference and Learning Algorithms
- Managing Gigabytes: Compressing and Indexing Documents and Images
- Introduction to Information Retrieval
- Information Retrieval: Algorithms and Heuristics
- Concise Computer Vision
- Computer Vision: Algorithms and Applications
- Learning OpenCV: Computer Vision in C++ with the OpenCV Library
- Functional Geometry
- Pictures: A simple structured graphics model
- The Problem with Threads (Threads are Evil)
First things first: Machines are meant to be identical. Ansible provisions your local Vagrant box the same way it provisions a server. This way the production environment is the same as the development one, and we avoid hard to find bugs while being fairly certain that if something works in dev, it will work in prod.
Specifically, machines look like this:
-
The application is run inside a virtualenv, even if it's the only application in the server. This makes it easy to add other applications should the need arise, for instance, you might want to run an IPython Notebook server with a Notebook that provides some analytics and charts of the data in your database, without contaminating the app's environment with IPython's dependencies.
-
Nginx is used as a reverse proxy, sending requests from the Internet to the Django server and responses the other way around. Nginx can take care of load balancing, caching, HTTP acceleration and some degree of security.
-
Supervisor is used to keep the actual application server running, as well as running other scripts or processes. Every process is logged to disk for debugging.
-
Postgres is the database, of course.
Our tech stack looks roughly like this on most projects:
This is, of course, an approximation. Some projects use NoSQL databases in addition to relational ones, others use other things like message queues, some use specific tools like Varnish instead of Nginx for HTTP acceleration.