The following isn't an offer for a job opening, nor does it represent hiring policy for any particular organization.
That said, if you're interested in joining a team such as the one that produces the Rich Context knowledge graph, here are a few notes about what kind of skills are needed most.
Sometimes a strange notion circulates among people in management, that if a team can simply hire someone with the title data scientist or machine learning engineer and have them work in the cloud while using Agile with expensive proprietary vendor platforms ... that somehow magical things will result. Clearly, that's misunderstood.
Speaking of misunderstandings, the most valuable skills for this kind of work probably don't align with current data science curricula in university programs and other related programs. Aside from only a few graduate computer science programs -- and even fewer R&D organizations in industry -- these skills aren't taught much.
That begs a question: what is needed? We're eager to build a diverse and inclusive team, and help many people find out more about what we really need. So let's clarify, candidly, openly...
The skills required for contributing within the team include:
-
Both attention to detail and curiousity as core personality
traits -
Python 3.x coding, testing, and packaging
- ideally: prior experience developing open source Python
libraries which you've deployed on PyPi and responded to
feedback from people using those libraries.
- ideally: prior experience developing open source Python
-
Git, beyond the basics
- ideally: adept at using GitHub issues, pull requests,
branching, submodules, commit hooks, rebase, and how to
resolve merge conflicts.
- ideally: adept at using GitHub issues, pull requests,
-
Writing code and instructions that others can use
-
Building and managing data pipelines
- ideally: familiar with cleaning data, validating and
correcting metadata, troubleshooting data-related errors,
coordinating distributed workflows.
- ideally: familiar with cleaning data, validating and
-
Familiarity with software engineering process and team
collaboration for developing and maintaining a code base -
Previous experience on research projects
- using datasets obtained from other organizations
- statistical thinking
- how peer-reviewed research papers get published
- leveraging scholarly infrastructure such as PubMed,
ResearchGate, OpenAIRE, RePEc, etc.
These cover the essential skills. Given these foundations, any reasonably good AI team can train in situ for whatever else an individual needs to acquire.
Additionally, if you have any of the following skills to contribute, these help so much:
- prior work in data governance and metadata exchange
- using cloud-based infrastructure (AWS, GCP, Azure, etc.)
- natural language understanding
- W3C open standards for controlled vocabularies, OWL, SKOS, RDF
- writing documentation in PyDoc
- understanding the history and nuances of linked data
- user testing
- ways to leverage or improve human-in-the-loop approaches
- recommender systems
- prior work in social science research or public policy
- entity linking
- parsing PDFs
- interactive network diagram visualization in PyVis, Vis.js
- weak supervision
- a knack for writing useful unit tests
- feature engineering
- performance monitoring and analysis
- knowledge graph representation
- web-app UI development using Flask, PureCSS, Vue.js
- applications of graph algorithms
- libraries: networkx, spaCy, Ray, Parsr, PyTorch, sklearn, datasketch
- developing and testing graph embedding models
- API development using OpenAPI (Swagger)
- participation in machine learning competitions
For the Rich Context team in particular, we'd like to see: code you've written in public repositories, articles you've published, videos for talks that you've presented at meetups or conferences.
It helps especially if you've been involved with open source projects: committing code, developing tutorials, improving the documentation, participating in community forums and events, etc.
Also, what would you like to contribute here? For example, how would you enhance or extend the following projects?
- https://github.com/Coleridge-Initiative/RCGraph
- https://github.com/Coleridge-Initiative/RCServer
- https://github.com/Coleridge-Initiative/RCApi
- https://github.com/Coleridge-Initiative/rclc
We enjoy meeting with people who are eager to get busy working with real data, who are interested in practical approaches for machine learning and knowledge graph representation.
OTOH, to be clear, we don't need to spend time discussing...
- GraphQL
- preferred commercial software vendors
- Schema.org
- graph databases
- specific IDEs
- "TensorFlow can do everything"
- JVM-based environments
- social networks
- React/Redux
- SPARQL
- gamification
- "Yeah, but Google already solved Knowledge Graph"
- ways to increase the frequency and length of scheduled meetings
Thank you very much.