Replies: 2 comments 1 reply
-
Thank you for your idea! Overall I would open to adding something like this, one of the features cloc has over tokei is that it can operate on a tarball or zip file. However I think I'd need to know more about the technical details of implementing such a change. Right now, I don't think I would want to have a bespoke virtual file system handling entirely inside tokei, because we already use a third party library for filesystem traversal (which I would like to keep using), it seems a bit out of scope of tokei, and it seems like something that would be useful for other projects. So if there's a way we can have this feature such that it integrates with |
Beta Was this translation helpful? Give feedback.
-
I've needed to build something like this myself. Here's a start towards a solution using git2-rs: use anyhow::{anyhow, Result};
use git2::Repository;
use std::ffi::OsStr;
use std::path::Path;
use tokei::{Config, LanguageType, CodeStats};
// TODO: This doesn't allow us to track types with shebangs. We need to read the blob for that.
// See the `blob.content()` piece below on the how - just something I didn't need right now.
fn file_language(path: &Path) -> Result<LanguageType> {
let extension = path
.extension()
.and_then(OsStr::to_str)
.unwrap_or_else(|| path.to_str().unwrap()); //This will allow for types like `Makefile`
LanguageType::from_file_extension(&extension.to_lowercase())
.ok_or_else(|| anyhow!("Could not map extension to language"))
}
fn code_stats_history_for_file(file_path: &Path, repo: &Repository) -> Result<Vec<CodeStats>> {
let config = Config::default();
let mut stats = Vec::new();
let language = file_language(file_path)?;
let mut revwalk = repo.revwalk()?;
revwalk.set_sorting(git2::Sort::REVERSE)?;
revwalk.push_head()?;
for commit_oid in revwalk {
let commit_oid = commit_oid?;
let commit = repo.find_commit(commit_oid)?;
//Skip initial commit since there's no diff
if commit.parent_count() == 1 {
let previous_commit = commit.parent(0)?;
let tree = commit.tree()?;
let previous_tree = previous_commit.tree()?;
let diff = repo.diff_tree_to_tree(Some(&previous_tree), Some(&tree), None)?;
let delta = diff.deltas().find_map(|d| {
if d.new_file().path().unwrap_or_else(|| Path::new(".")) == file_path {
d.new_file().path()
} else {
None
}
});
if let Some(path) = delta {
let file = tree.get_path(path)?;
let obj = file.to_object(repo)?;
if let Some(blob) = obj.as_blob() {
let code_stats =
LanguageType::parse_from_slice(language, blob.content(), config);
stats.push(code_stats);
} else {
return Err(anyhow!("Could not load blob from delta file path"));
}
}
}
}
Ok(stats)
} With inputs like: let repo = Repository::open("repository/path")?;
let file_path = Path::new("file/path/in/repo.jl"); |
Beta Was this translation helpful? Give feedback.
-
Hypothetically, tokei could traverse git objects directly as a virtual file system (VFS) as of a particular ref. This would mean you could run tokei on historical revisions without changing any state on the filesystem. This is helpful in very large repos where just checkout itself can take a long time, or the filesystem is immutable, or in bare git repos that don't create any physical files at all. This could even have performance benefits because many small files are often combined into packfiles and compressed meaning fewer disk accesses.
I'm trying to script tokei to create a summarization for every week of the history of a repo, and I'm finding that it's more trouble and time just managing git checkout state (waiting for checkout, restoring the original branch, trapping on early exit, etc) than using tokei itself.
Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions