Rich Context Skills

The following isn't an offer for a job opening, nor does it represent hiring policy for any particular organization.

That said, if you're interested in joining a team such as the one that produces the Rich Context knowledge graph, here are a few notes about what kind of skills are needed most.

Why Explain This?

Sometimes a strange notion circulates among people in management, that if a team can simply hire someone with the title data scientist or machine learning engineer and have them work in the cloud while using Agile with expensive proprietary vendor platforms ... that somehow magical things will result. Clearly, that's misunderstood.

Speaking of misunderstandings, the most valuable skills for this kind of work probably don't align with current data science curricula in university programs and other related programs. Aside from only a few graduate computer science programs -- and even fewer R&D organizations in industry -- these skills aren't taught much.

That begs a question: what is needed? We're eager to build a diverse and inclusive team, and help many people find out more about what we really need. So let's clarify, candidly, openly...

Must-Have's

The skills required for contributing within the team include:

Both attention to detail and curiousity as core personality
traits
Python 3.x coding, testing, and packaging
- ideally: prior experience developing open source Python
  libraries which you've deployed on PyPi and responded to
  feedback from people using those libraries.
Git, beyond the basics
- ideally: adept at using GitHub issues, pull requests,
  branching, submodules, commit hooks, rebase, and how to
  resolve merge conflicts.
Writing code and instructions that others can use
Building and managing data pipelines
- ideally: familiar with cleaning data, validating and
  correcting metadata, troubleshooting data-related errors,
  coordinating distributed workflows.
Familiarity with software engineering process and team
collaboration for developing and maintaining a code base
Previous experience on research projects
- using datasets obtained from other organizations
- statistical thinking
- how peer-reviewed research papers get published
- leveraging scholarly infrastructure such as PubMed,
  ResearchGate, OpenAIRE, RePEc, etc.

These cover the essential skills. Given these foundations, any reasonably good AI team can train in situ for whatever else an individual needs to acquire.

Nice-to-Have's

Additionally, if you have any of the following skills to contribute, these help so much:

prior work in data governance and metadata exchange
using cloud-based infrastructure (AWS, GCP, Azure, etc.)
natural language understanding
W3C open standards for controlled vocabularies, OWL, SKOS, RDF
writing documentation in PyDoc
understanding the history and nuances of linked data
user testing
ways to leverage or improve human-in-the-loop approaches
recommender systems
prior work in social science research or public policy
entity linking
parsing PDFs
interactive network diagram visualization in PyVis, Vis.js
weak supervision
a knack for writing useful unit tests
feature engineering
performance monitoring and analysis
knowledge graph representation
web-app UI development using Flask, PureCSS, Vue.js
applications of graph algorithms
libraries: networkx, spaCy, Ray, Parsr, PyTorch, sklearn, datasketch
developing and testing graph embedding models
API development using OpenAPI (Swagger)
participation in machine learning competitions

Getting To Know You...

For the Rich Context team in particular, we'd like to see: code you've written in public repositories, articles you've published, videos for talks that you've presented at meetups or conferences.

It helps especially if you've been involved with open source projects: committing code, developing tutorials, improving the documentation, participating in community forums and events, etc.

Also, what would you like to contribute here? For example, how would you enhance or extend the following projects?

https://github.com/Coleridge-Initiative/RCGraph
https://github.com/Coleridge-Initiative/RCServer
https://github.com/Coleridge-Initiative/RCApi
https://github.com/Coleridge-Initiative/rclc

Counterfactuals

We enjoy meeting with people who are eager to get busy working with real data, who are interested in practical approaches for machine learning and knowledge graph representation.

OTOH, to be clear, we don't need to spend time discussing...

GraphQL
preferred commercial software vendors
Schema.org
graph databases
specific IDEs
"TensorFlow can do everything"
JVM-based environments
social networks
React/Redux
SPARQL
gamification
"Yeah, but Google already solved Knowledge Graph"
ways to increase the frequency and length of scheduled meetings

Thank you very much.

Contact

https://coleridgeinitiative.org/richcontext
https://twitter.com/NYUColeridge
mailto:dataanalytics@coleridgeinitiative.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SKILLS.md

SKILLS.md

Rich Context Skills

Why Explain This?

Must-Have's

Nice-to-Have's

Getting To Know You...

Counterfactuals

Contact

Files

SKILLS.md

Latest commit

History

SKILLS.md

File metadata and controls

Rich Context Skills

Why Explain This?

Must-Have's

Nice-to-Have's

Getting To Know You...

Counterfactuals

Contact