Skip to content
Change the repository type filter

All

    Repositories list

    • PanzaMail

      Public
      Python
      Apache License 2.0
      1425634Updated Oct 18, 2024Oct 18, 2024
    • torch_cgx

      Public
      Pytorch distributed backend extension with compression support
      C++
      GNU Affero General Public License v3.0
      01540Updated Oct 17, 2024Oct 17, 2024
    • Python
      Apache License 2.0
      0600Updated Sep 5, 2024Sep 5, 2024
    • Boosting 4-bit inference kernels with 2:4 Sparsity
      Cuda
      Apache License 2.0
      24910Updated Sep 4, 2024Sep 4, 2024
    • marlin

      Public
      FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
      Python
      Apache License 2.0
      45594244Updated Sep 4, 2024Sep 4, 2024
    • sparsegpt

      Public
      Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
      Python
      Apache License 2.0
      94707141Updated Aug 20, 2024Aug 20, 2024
    • peft-rosa

      Public
      A fork of the PEFT library, supporting Robust Adaptation (RoSA)
      Python
      Apache License 2.0
      31310Updated Aug 16, 2024Aug 16, 2024
    • LLM training code for Databricks foundation models
      Python
      Apache License 2.0
      524001Updated Jul 24, 2024Jul 24, 2024
    • GridSearcher simplifies running grid searches for machine learning projects in Python, emphasizing parallel execution and GPU scheduling without dependencies on SLURM or other workload managers.
      Python
      Apache License 2.0
      0200Updated Jul 22, 2024Jul 22, 2024
    • MicroAdam

      Public
      This repository contains code for the MicroAdam paper.
      Python
      Apache License 2.0
      31100Updated Jun 28, 2024Jun 28, 2024
    • Python
      MIT License
      0000Updated Jun 27, 2024Jun 27, 2024
    • spops

      Public
      C++
      Apache License 2.0
      0610Updated Jun 20, 2024Jun 20, 2024
    • Code for the EMNLP 2024 paper "Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on LLMs".
      Python
      Apache License 2.0
      0610Updated Jun 18, 2024Jun 18, 2024
    • SPADE

      Public
      Code of SPADE: Sparsity Guided Debugging for Deep Neural Networks
      Jupyter Notebook
      0000Updated May 25, 2024May 25, 2024
    • QUIK

      Public
      Repository for the QUIK project, enabling the use of 4bit kernels for generative inference
      C++
      Apache License 2.0
      1216851Updated Apr 16, 2024Apr 16, 2024
    • FastOBQ-

      Public
      GPTQ with finetuning
      0000Updated Mar 27, 2024Mar 27, 2024
    • gptq

      Public
      Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
      Python
      Apache License 2.0
      1521.9k211Updated Mar 27, 2024Mar 27, 2024
    • RoSA

      Public
      Python
      Apache License 2.0
      23410Updated Feb 13, 2024Feb 13, 2024
    • Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
      Python
      Apache License 2.0
      53820Updated Jan 15, 2024Jan 15, 2024
    • CAP

      Public
      Repository for Correlation Aware Prune (NeurIPS23) source and experimental code
      Python
      Apache License 2.0
      1500Updated Nov 29, 2023Nov 29, 2023
    • qmoe

      Public
      Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
      Python
      Apache License 2.0
      2226030Updated Nov 3, 2023Nov 3, 2023
    • ZipLM

      Public
      Code for the NeurIPS 2023 paper: "ZipLM: Inference-Aware Structured Pruning of Language Models".
      0110Updated Oct 20, 2023Oct 20, 2023
    • TACO4NLP

      Public
      Task aware compression for various NLP tasks
      Python
      0200Updated Oct 9, 2023Oct 9, 2023
    • KDVR

      Public
      Code for the experiments in Knowledge Distillation Performs Partial Variance Reduction, NeurIPS 2023
      Python
      Apache License 2.0
      0100Updated Oct 6, 2023Oct 6, 2023
    • C++
      Apache License 2.0
      41300Updated Sep 27, 2023Sep 27, 2023
    • EFCP

      Public
      The repository contains code to reproduce the experiments from our paper Error Feedback Can Accurately Compress Preconditioners available below:
      Python
      Apache License 2.0
      0400Updated Sep 12, 2023Sep 12, 2023
    • QIGen

      Public
      Repository for CPU Kernel Generation for LLM Inference
      Python
      22400Updated Jul 13, 2023Jul 13, 2023
    • OBC

      Public
      Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
      Python
      149930Updated Jul 11, 2023Jul 11, 2023
    • Code for reproducing the paper "Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures"
      Jupyter Notebook
      1400Updated Jun 22, 2023Jun 22, 2023
    • spdy

      Public
      Code for ICML 2022 paper "SPDY: Accurate Pruning with Speedup Guarantees"
      Python
      41810Updated May 3, 2023May 3, 2023