Skip to content

Replace libgit2 with gitoxide (#2)#3703

Open
blinxen wants to merge 7 commits intosharkdp:masterfrom
blinxen:master
Open

Replace libgit2 with gitoxide (#2)#3703
blinxen wants to merge 7 commits intosharkdp:masterfrom
blinxen:master

Conversation

@blinxen
Copy link
Copy Markdown

@blinxen blinxen commented Apr 26, 2026

This is a continuation of #3235

This time all tests pass without any changes :D Looks promising!

Context: This PR replaces the usage of libgit2 with a full Rust implementation of git. This was tried before but a couple of issues were found in the Rust implementation of git. These issues seem to be fixed now. Therefore, I am opening this again and would like to see what the maintainers think.

CC: @Byron

@Byron
Copy link
Copy Markdown

Byron commented Apr 26, 2026

This is great to see, thanks so much 🙏.

The one thing that seems critical here is to have an example of a diff that (now) looks the same in bat as it looks with git, and ideally different with the previous implementation based on an older gix.

Unfortunately, whenever I try to find such an example, I come up empty 🤦‍♂️.

And before I forget please do set the default diff.algorithm to histogram, via gix::open::Options::isolated().config_overrides(Some("diff.algorithm=histogram")). It is very good and most importantly doesn't seem to be attackable, so cannot be tricked into long run times like myers. For context, the gix- merge crate was fuzzed and the fuzzer was going after the diff step which picked up Myers as default, causing diff-times of 1s which ultimately timed out certain fuzzer runs. histogram took 50ms for the same input.

@blinxen
Copy link
Copy Markdown
Author

blinxen commented Apr 26, 2026

The one thing that seems critical here is to have an example of a diff that (now) looks the same in bat as it looks with git

This should already be proven by the tests, even though it is not that big of a diff.

ideally different with the previous implementation based on an older gix

Previous PR should show this difference since it required me to change some test files.

And before I forget please do set the default diff.algorithm to histogram, via gix::open::Options::isolated().config_overrides(Some("diff.algorithm=histogram")). It is very good and most importantly doesn't seem to be attackable, so cannot be tricked into long run times like myers. For context, the gix- merge crate was fuzzed and the fuzzer was going after the diff step which picked up Myers as default, causing diff-times of 1s which ultimately timed out certain fuzzer runs. histogram took 50ms for the same input.

👍

@blinxen
Copy link
Copy Markdown
Author

blinxen commented Apr 26, 2026

Is there actually any need to call gix::open::Options::isolated().config_overrides(Some("diff.algorithm=histogram"))? We are calling diff_with_slider_heuristics directly and we pass the algorithm there.

@Byron
Copy link
Copy Markdown

Byron commented Apr 26, 2026

Then there is no need, it can be set there.

My clanker came up with something that seems to be worth taking a look at.

  • Restored the documented index-vs-worktree semantics by loading the old diff resource from the Git index entry rather than HEAD.

This might be wrong, but if so I think it makes sense to document the current behaviour more heavily. The new regression tests seem to cover the altered behaviour though.

I only took a quick look, but enough to hand it off for more diligence.

Review by GPT 5.5

Resolved the review findings in src/diff.rs.

  • Replaced gix::discover(...).unwrap() with fallible handling so --diff/change markers outside a Git repository return None instead of crashing.
  • Restored the documented index-vs-worktree semantics by loading the old diff resource from the Git index entry rather than HEAD.
  • Kept the worktree file as the new diff resource.
  • Added regression coverage for:
    • ordinary files outside Git repositories,
    • staged changes producing no worktree-vs-index markers,
    • newly added indexed files with further unstaged edits.

Validation run:

cargo fmt --check
cargo test diff::tests --features git
cargo check --features git
cargo test diff_plain --features git

Patch

diff --git a/src/diff.rs b/src/diff.rs
index 07b79dc4..cbb2d8d4 100644
--- a/src/diff.rs
+++ b/src/diff.rs
@@ -2,14 +2,12 @@
 
 use gix::diff::blob::pipeline::{Mode, WorktreeRoots};
 use gix::diff::blob::{Algorithm, HunkIter, ResourceKind};
-use gix::index::hash::Kind;
 use gix::object::tree::EntryKind;
-use gix::{self, ObjectId};
 use path_abs::PathInfo;
 use std::collections::HashMap;
 use std::path::Path;
 
-#[derive(Copy, Clone, Debug)]
+#[derive(Copy, Clone, Debug, PartialEq, Eq)]
 pub enum LineChange {
     Added,
     RemovedAbove,
@@ -44,9 +42,13 @@ fn collect_changes_from_hunks(hunks: HunkIter) -> Option<LineChanges> {
 
 pub fn get_git_diff(filename: &Path) -> Option<LineChanges> {
     let filepath_absolute = filename.canonicalize().ok()?;
-    let repository = gix::discover(filepath_absolute.parent().ok()?).unwrap();
+    let repository = gix::discover(filepath_absolute.parent().ok()?).ok()?;
     let repo_path_absolute = repository.workdir()?.canonicalize().ok()?;
     let filepath_relative_to_repo = filepath_absolute.strip_prefix(&repo_path_absolute).ok()?;
+    let filepath_relative_to_repo =
+        gix::path::to_unix_separators_on_windows(gix::path::into_bstr(filepath_relative_to_repo));
+    let index = repository.index_or_load_from_head_or_empty().ok()?;
+    let index_entry = index.entry_by_path(filepath_relative_to_repo.as_ref())?;
     let mut cache = repository
         .diff_resource_cache(
             Mode::ToGit,
@@ -58,23 +60,18 @@ pub fn get_git_diff(filename: &Path) -> Option<LineChanges> {
         .ok()?;
     cache
         .set_resource(
-            repository
-                .head_tree()
-                .ok()?
-                .lookup_entry_by_path(filepath_relative_to_repo.to_str()?)
-                .ok()??
-                .object_id(),
-            EntryKind::Blob,
-            filepath_relative_to_repo.to_str()?.into(),
+            index_entry.id,
+            index_entry.mode.to_tree_entry_mode()?.kind(),
+            filepath_relative_to_repo.as_ref(),
             ResourceKind::OldOrSource,
             &repository,
         )
         .ok()?;
     cache
         .set_resource(
-            ObjectId::null(Kind::Sha1),
+            repository.object_hash().null(),
             EntryKind::Blob,
-            filepath_relative_to_repo.to_str()?.into(),
+            filepath_relative_to_repo.as_ref(),
             ResourceKind::NewOrDestination,
             &repository,
         )
@@ -86,3 +83,72 @@ pub fn get_git_diff(filename: &Path) -> Option<LineChanges> {
 
     collect_changes_from_hunks(diff.hunks())
 }
+
+#[cfg(test)]
+mod tests {
+    use super::{get_git_diff, LineChange};
+    use std::path::Path;
+    use std::process::Command;
+
+    fn git(repo: &Path, args: &[&str]) {
+        let output = Command::new("git")
+            .args(args)
+            .current_dir(repo)
+            .output()
+            .expect("git command can run");
+        assert!(
+            output.status.success(),
+            "git {args:?} failed\nstdout: {}\nstderr: {}",
+            String::from_utf8_lossy(&output.stdout),
+            String::from_utf8_lossy(&output.stderr)
+        );
+    }
+
+    fn setup_repo() -> tempfile::TempDir {
+        let dir = tempfile::tempdir().expect("can create temporary directory");
+        let repo = dir.path();
+        git(repo, &["init"]);
+        git(repo, &["config", "user.email", "test@test.test"]);
+        git(repo, &["config", "user.name", "Test"]);
+        dir
+    }
+
+    #[test]
+    fn no_git_diff_outside_repository() {
+        let dir = tempfile::tempdir().expect("can create temporary directory");
+        let file = dir.path().join("file.txt");
+        std::fs::write(&file, "line 1\n").expect("can write file");
+
+        assert_eq!(get_git_diff(&file), None);
+    }
+
+    #[test]
+    fn git_diff_compares_worktree_to_index() {
+        let repo = setup_repo();
+        let file = repo.path().join("file.txt");
+        std::fs::write(&file, "line 1\nline 2\n").expect("can write file");
+        git(repo.path(), &["add", "file.txt"]);
+        git(repo.path(), &["commit", "-m", "initial"]);
+
+        std::fs::write(&file, "line 1\nline 2 modified\n").expect("can write file");
+        git(repo.path(), &["add", "file.txt"]);
+
+        assert_eq!(get_git_diff(&file), Some(Default::default()));
+    }
+
+    #[test]
+    fn git_diff_uses_index_for_newly_added_files() {
+        let repo = setup_repo();
+        std::fs::write(repo.path().join("committed.txt"), "committed\n").expect("can write file");
+        git(repo.path(), &["add", "committed.txt"]);
+        git(repo.path(), &["commit", "-m", "initial"]);
+
+        let file = repo.path().join("new.txt");
+        std::fs::write(&file, "line 1\nline 2\n").expect("can write file");
+        git(repo.path(), &["add", "new.txt"]);
+        std::fs::write(&file, "line 1\nline 2 modified\n").expect("can write file");
+
+        let changes = get_git_diff(&file).expect("newly added file has an index entry");
+        assert_eq!(changes.get(&2), Some(&LineChange::Modified));
+    }
+}

@blinxen
Copy link
Copy Markdown
Author

blinxen commented Apr 27, 2026

Not sure if we need to care about windows paths (I expected gix to handle it) but I applied the suggestion from GPT. I also liked the tests it added, it was a good start. I think I covered most standard cases.

Comment thread src/diff.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants