A learning repository dedicated to the Julia Language. This is a short course for understanding basic critical data science techniques, and implementing them 'by hand' when at all possible.
The goal is to teach elegant customized programming that scales.
- How the Digital Age differs from prior eras, and why it makes data science skills mandatory.
The Digital Age, also known as the Information Age, is characterized by a shift from traditional industry brought about by the Industrial Revolution to an economy based on information technology. The Digital Age is often considered to have begun in the latter half of the 20th century, with several key developments marking its onset. The transition into the Digital Age was gradual, marked by a series of technological advancements rather than a single event. The invention of the transistor at Bell Labs is a foundational event; and its subsequent miniaturization led to microprocessors that enabled wide adoption for computing. This burst in computing power might have had a more muted effect had it not been for the invention of the Internet in the late 1960's and 1970's, which enabled the sharing of data across computers.
It began in the late 20th century, thanks to the rapid advancement of digital computing and communication technologies. This era is marked by the widespread adoption of computers, the Internet, and digital technologies that transform how we create, store, share, and analyze information.
- Basic occupational definitions: Data Science, Machine Learning Engineer, Data Engineer - what are these jobs?
- Basic hardware terms: Terminal/bash, API, SQL, Docker, IDE.
- There is so much information, but it's hard to know what is worth believing. To better understand the world, we collect data. And a lot of it.
- Birdwatch example: misinformation - it's hard to know which Tweets you can trust, but we know that people react to tweets, so perhaps we can use those reactions to figure out whether a Tweet is trustworthy or not. We use raters and notes - how to know which notes are reliable?
-
- Looking at the matrix of notes and raters, and creating a note quality score.
- Julia programming (loops, types, broadcasting, regex, CSV, DataFrames, Functions, Packages, compilation and speed).
- SQL (select, group by, where, join)
- Bash (cd, ls, mv, ssh, pip)
- Docker (build, run, push)
- Which IDE you need (VSCODE)
- Loading a csv and looking at key observations
- Plotting w Gadfly
- Basic understanding of statistics, distributions
- Concepts: distributions, mean, variance, differences in means, t-tests, correlation, causal inference
- Machine learning models and their forms
- Perceptron
- DNN
- RNN
- CNN
- Basic differences between ordinary neural networks and what drives generative AI.
- Calling the OpenAI API to make an app.
- Querying documents.
- Bash
- Docker
- SSH
- Github
- SQL