在大语言模型(LLM)迅速发展的今天,开发者们面临着海量的资源和工具选择。如何高效地筛选和利用这些资源,成为了每一个 LLM 开发者的关键任务。 今天,我们要介绍的 GitHub 仓库——LLM Engineer Toolkit,或许能成为你的得力助手!
https://github.com/KalyanKS-NLP/llm-engineer-toolkit
这个由 KalyanKS-NLP 创建的仓库,精心整理了超过 120 个 LLM 相关的库,并按照类别进行了分类。无论是训练、推理、应用开发,还是数据提取、安全评估,你都能在这里找到对应的工具。
大模型工具划分
-
🚀 LLM Training:专注于 LLM 训练和微调的工具,帮助你更快、更高效地优化模型。
-
🧱 LLM Application Development:从框架到多 API 接入,再到缓存和低代码开发,为应用开发提供全方位支持。
-
🩸 LLM RAG:Retrieval-Augmented Generation(检索增强生成)相关的库,提升模型的知识检索能力。
-
🟩 LLM Inference:推理加速和优化工具,让模型运行更流畅。
-
🚧 LLM Serving:模型部署和推理服务的解决方案。
-
📤 LLM Data Extraction:数据提取工具,帮助你从各种来源获取高质量数据。
-
🌠 LLM Data Generation:生成合成数据,丰富你的训练集。
-
💎 LLM Agents:构建智能代理,实现自动化任务和多代理协作。
-
⚖️ LLM Evaluation:评估工具,确保模型性能达到预期。
-
🔍 LLM Monitoring:监控模型运行状态,及时发现并解决问题。
-
📅 LLM Prompts:优化和管理提示词,提升模型输出质量。
-
📝 LLM Structured Outputs:生成结构化输出,让模型结果更易用。
-
🛑 LLM Safety and Security:保障模型的安全性和可靠性。
-
💠 LLM Embedding Models:提供先进的文本嵌入模型。
-
❇️ Others:其他实用工具,涵盖更多开发场景。
LLM Training and Fine-Tuning
|
|
|
|
|
Fine-tune LLMs faster with less memory.
|
|
|
State-of-the-art Parameter-Efficient Fine-Tuning library.
|
|
|
Train transformer language models with reinforcement learning.
|
|
|
Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.
|
|
|
Tool designed to streamline post-training for various AI models.
|
|
|
A comprehensive library for implementing LLMs, including a unified training pipeline and comprehensive model evaluation.
|
|
|
Train and fine-tune LLM lightning fast.
|
|
|
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
|
|
|
Easy and efficient LLM fine-tuning.
|
|
|
Low-code framework for building custom LLMs, neural networks, and other AI models.
|
|
|
A framework for training instruction-tuned models.
|
|
|
An integrated LLM inference and tuning platform.
|
|
|
xTuring provides fast, efficient and simple fine-tuning of open-source LLMs, such as Mistral, LLaMA, GPT-J, and more.
|
|
|
A modular RL library to fine-tune language models to human preferences.
|
|
|
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
|
|
|
A PyTorch-native library specifically designed for fine-tuning LLMs.
|
|
|
A library that offers a high-level interface for pretraining and fine-tuning LLMs.
|
LLM Application Development
Frameworks
|
|
|
|
|
LangChain is a framework for developing applications powered by large language models (LLMs).
|
|
|
LlamaIndex is a data framework for your LLM applications.
|
|
|
Haystack is an end-to-end LLM framework that allows you to build applications powered by LLMs, Transformer models, vector search and more.
|
|
|
A suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications.
|
|
|
A modular Python framework for building AI-powered applications.
|
|
|
Weave is a toolkit for developing Generative AI applications.
|
|
|
|
Data Preparation
|
|
|
|
|
Data Prep Kit accelerates unstructured data preparation for LLM app developers. Developers can use Data Prep Kit to cleanse, transform, and enrich use case-specific unstructured data to pre-train LLMs, fine-tune LLMs, instruct-tune LLMs, or build RAG applications.
|
Multi API Access
|
|
|
|
|
Library to call 100+ LLM APIs in OpenAI format.
|
|
|
A Blazing Fast AI Gateway with integrated Guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.
|
Routers
|
|
|
|
|
Framework for serving and evaluating LLM routers - save LLM costs without compromising quality. Drop-in replacement for OpenAI's client to route simpler queries to cheaper models.
|
Memory
|
|
|
|
|
The Memory layer for your AI apps.
|
|
|
An AI memory layer with short- and long-term storage, semantic clustering, and optional memory decay for context-aware applications.
|
|
|
An open-source framework for building stateful LLM applications with advanced reasoning capabilities and transparent long-term memory
|
|
|
A user profile-based memory system designed to bring long-term user memory to your Generative AI applications.
|
Interface
|
|
|
|
|
A faster way to build and share data apps. Streamlit lets you transform Python scripts into interactive web apps in minutes
|
|
|
Build and share delightful machine learning apps, all in Python.
|
|
|
Build chat and generative user interfaces.
|
|
|
Create AI apps powered by various AI providers.
|
|
|
Python package for easily interfacing with chat apps, with robust features and minimal code complexity.
|
|
|
Build production-ready Conversational AI applications in minutes.
|
Low Code
|
|
|
|
|
LangFlow is a low-code app builder for RAG and multi-agent AI applications. It’s Python-based and agnostic to any model, API, or database.
|
Cache
|
|
|
|
|
A Library for Creating Semantic Cache for LLM Queries. Slash Your LLM API Costs by 10x 💰, Boost Speed by 100x. Fully integrated with LangChain and LlamaIndex.
|
LLM RAG
|
|
|
|
|
Streamlined and promptable Fast GraphRAG framework designed for interpretable, high-precision, agent-driven retrieval workflows.
|
|
|
RAG chunking library that is lightweight, lightning-fast, and easy to use.
|
|
|
A Fine-grained Framework For Diagnosing RAG.
|
|
|
Build, scale, and deploy state-of-the-art Retrieval-Augmented Generation applications.
|
|
|
Beyond LLM offers an all-in-one toolkit for experimentation, evaluation, and deployment of Retrieval-Augmented Generation (RAG) systems.
|
|
|
A vector search SQLite extension that runs anywhere!
|
|
|
fastRAG is a research framework for efficient and optimized retrieval-augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval.
|
|
|
A Python Toolkit for Efficient RAG Research.
|
|
|
Unified framework for building enterprise RAG pipelines with small, specialized models.
|
|
|
A lightweight unified API for various reranking models.
|
|
|
Build Agentic RAG applications.
|
LLM Inference
|
|
|
|
|
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment.
|
|
|
Python-based LLM inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
|
|
|
High-throughput and memory-efficient inference and serving engine for LLMs.
|
|
|
Run PyTorch LLMs locally on servers, desktop, and mobile.
|
|
|
TensorRT-LLM is a library for optimizing Large Language Model (LLM) inference.
|
|
|
High-performance In-browser LLM Inference Engine.
|
LLM Serving
|
|
|
|
|
Serving LangChain LLM apps and agents automagically with FastAPI.
|
|
|
Lightning-fast serving engine for any AI model of any size. It augments FastAPI with features like batching, streaming, and GPU autoscaling.
|
LLM Data Extraction
|
|
|
|
|
Open-source LLM Friendly Web Crawler & Scraper.
|
|
|
A web scraping Python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.).
|
|
|
Docling parses documents and exports them to the desired format with ease and speed.
|
|
|
GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents).
|
|
|
PyMuPDF4LLM library makes it easier to extract PDF content in the format you need for LLM & RAG environments.
|
|
|
A web scraping and browser automation library.
|
|
|
Parser for every type of document.
|
|
|
Document Intelligence library for LLMs.
|
LLM Data Generation
|
|
|
|
|
DataDreamer is a powerful open-source Python library for prompting, synthetic data generation, and training workflows.
|
|
|
A flexible open-source framework to generate datasets with large language models.
|
|
|
Synthetic Dataset Generation Library.
|
|
|
An Easy-to-use Instruction Processing Framework for Large Language Models.
|
LLM Agents
|
|
|
|
|
Framework for orchestrating role-playing, autonomous AI agents.
|
|
|
Build resilient language agents as graphs.
|
|
|
Build AI Agents with memory, knowledge, tools, and reasoning. Chat with them using a beautiful Agent UI.
|
|
|
Build agentic apps using LLMs with context, tools, hand off to other specialized agents.
|
|
|
An open-source framework for building AI agent systems.
|
|
|
Library to build powerful agents in a few lines of code.
|
|
|
Python agent framework to build production grade applications with Generative AI.
|
|
|
Build production-ready multi-agent systems in Python.
|
|
|
A Python library for converting Gradio apps into tools that can be leveraged by an LLM-based agent to complete its task.
|
|
|
Production Ready Toolset for AI Agents.
|
|
|
Building AI agents, atomically.
|
|
|
Open Source Memory Layer For Autonomous Agents.
|
|
|
Make websites accessible for AI agents.
|
|
|
An Open Toolkit to Enable Web Agents on Large Language Models.
|
|
|
A lightweight framework for building LLM-based agents.
|
|
|
A Low-code Development Tool For Building Multi-agent LLMs Applications.
|
|
|
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework.
|
|
|
ChatArena is a library that provides multi-agent language game environments and facilitates research about autonomous LLM agents and their social interactions.
|
|
|
Educational framework exploring ergonomic, lightweight multi-agent orchestration.
|
|
|
The fastest way to build robust AI agents.
|
|
|
Intelligent gateway for Agents.
|
|
|
A lightweight task engine for building AI agents.
|
|
|
Python SDK for AI agent monitoring.
|
|
|
|
|
|
Framework for creating and managing simulations populated with AI-powered agents.
|
|
|
Reliable AI agent framework that supports MCP.
|
LLM Evaluation
|
|
|
|
|
Ragas is your ultimate toolkit for evaluating and optimizing Large Language Model (LLM) applications.
|
|
|
Open-Source Evaluation & Testing for ML & LLM systems.
|
|
|
|
|
|
All-in-one toolkit for evaluating LLMs.
|
|
|
Evaluation and Tracking for LLM Experiments
|
|
|
A unified evaluation framework for large language models.
|
|
|
Deliver Safe & Effective Language Models. 60+ Test Types for Comparing LLM & NLP Models on Accuracy, Bias, Fairness, Robustness & More.
|
|
|
A rigorous evaluation framework for LLM4Code.
|
|
|
An open platform for training, serving, and evaluating large language model-based chatbots.
|
|
|
A small library of LLM judges.
|
|
|
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
|
|
|
Evaluators and utilities for evaluating the performance of your agents.
|
|
|
A comprehensive library for implementing LLMs, including a unified training pipeline and comprehensive model evaluation.
|
|
|
An open-source end-to-end LLM Development Platform which also includes LLM evaluation.
|
LLM Monitoring
|
|
|
|
|
An open-source end-to-end MLOps/LLMOps Platform for tracking, evaluating, and monitoring LLM applications.
|
|
|
An open-source end-to-end LLM Development Platform which also includes LLM monitoring.
|
|
|
Provides tools for logging, monitoring, and improving your LLM applications.
|
|
|
W&B provides features for tracking LLM performance.
|
|
|
Open source LLM-Observability Platform for Developers. One-line integration for monitoring, metrics, evals, agent tracing, prompt management, playground, etc.
|
|
|
An open-source ML and LLM observability framework.
|
|
|
An open-source AI observability platform designed for experimentation, evaluation, and troubleshooting.
|
|
|
A Lightweight Library for AI Observability.
|
LLM Prompts
|
|
|
|
|
A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models.
|
|
|
Selective Context compresses your prompt and context to allow LLMs (such as ChatGPT) to process 2x more content.
|
|
|
Library for compressing prompts to accelerate LLM inference.
|
|
|
Test suite for LLM prompts before pushing them to production.
|
|
|
Solve NLP Problems with LLMs & easily generate different NLP Task prompts for popular generative models like GPT, PaLM, and more with Promptify.
|
|
|
PromptSource is a toolkit for creating, sharing, and using natural language prompts.
|
|
|
DSPy is the open-source framework for programming—rather than prompting—language models.
|
|
|
|
|
|
Prompt optimization library.
|
LLM Structured Outputs
|
|
|
|
|
Python library for working with structured outputs from large language models (LLMs). Built on top of Pydantic, it provides a simple, transparent, and user-friendly API.
|
|
|
An open-source library for efficient, flexible, and portable structured generation.
|
|
|
Robust (structured) text generation
|
|
|
Guidance is an efficient programming paradigm for steering language models.
|
|
|
A language for constraint-guided and efficient LLM programming.
|
|
|
A Bulletproof Way to Generate Structured JSON from Language Models.
|
LLM Safety and Security
|
|
|
|
|
A collection of automated evaluators for assessing jailbreak attempts.
|
|
|
An easy-to-use Python framework to generate adversarial jailbreak prompts.
|
|
|
Adding guardrails to large language models.
|
|
|
The Security Toolkit for LLM Interactions.
|
|
|
AuditNLG is an open-source library that can help reduce the risks associated with using generative AI systems for language.
|
|
|
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
|
|
|
LLM vulnerability scanner
|
|
|
The LLM Red Teaming Framework
|
LLM Embedding Models
|
|
|
|
|
State-of-the-Art Text Embeddings
|
|
|
Fast State-of-the-Art Static Embeddings
|
|
|
A blazing fast inference solution for text embeddings models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.
|
Others
|
|
|
|
|
A modular and extensible Python framework, designed to aid in the creation of high-quality, unbiased datasets to build robust models for MGT-related tasks such as detection, attribution, and boundary detection.
|
|
|
A library for advanced large language model reasoning.
|
|
|
An Easy-to-use Knowledge Editing Framework for Large Language Models.
|
|
|
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM.
|
|
|
This package integrates Large Language Models (LLMs) into spaCy, featuring a modular system for fast prototyping and prompting, and turning unstructured responses into robust outputs for various NLP tasks.
|
|
|
Chat with your database (SQL, CSV, pandas, polars, MongoDB, NoSQL, etc.).
|
|
|
An open-source interactive toolkit for analyzing internal workings of Transformer-based language models.
|
|
|
Chat with your SQL database. Accurate Text-to-SQL Generation via LLMs using RAG.
|
|
|
Tools for merging pretrained large language models.
|
|
|
An Open-Source Toolkit for LLM Watermarking.
|
|
|
An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).
|
|
|
Automatically annotate papers using LLMs.
|
|
|
Make any LLM think like OpenAI o1 and DeepSeek R1.
|
公众号后台回复“数据集”获取100+深度学习各方向资源整理

点击阅读原文进入CV社区
收获更多技术干货