
CocoIndex
Open-source engine for building scalable data pipelines with custom transformations and incremental updates, specialized for semantic search and RAG applications.

CocoIndex revolutionizes data pipeline management for semantic search and RAG applications with its innovative open-source approach. It uniquely combines custom transformation capabilities with intelligent incremental updates, making data processing both powerful and efficient.
Key features include:
- Declarative Pipeline Definition - Simple Python-based syntax for defining complex data flows
- Smart Incremental Processing - Automatically handles updates to data or logic, reprocessing only what's necessary
- Built-in Observability - Comprehensive lineage tracking and monitoring tools
- Versatile Integration - Supports multiple data sources, embedding models, and vector stores
- Production-Ready Scaling - Handles everything from development samples to large-scale deployments
The platform excels at document processing, offering advanced features like:
- Intelligent chunking with hierarchical support
- Flexible embedding model integration
- Automatic schema management
- Built-in deduplication and reconciliation
- Comprehensive API for index access
CocoInsight, the included visualization tool, helps optimize your pipeline with detailed metrics and visual debugging capabilities, making complex data flows understandable and manageable.
Details:
Stars
886Forks
57Last commit
14 hours agoRepository age
2 monthsLicense
Apache-2.0
We fetched this data automatically from GitHub .
Open source alternatives similar to CocoIndex:

Stars
Forks
Last commit

Stars
Forks
Last commit

Stars
Forks
Last commit