data-annotation

Here are 56 public repositories matching this topic...

cleanlab / cleanlab

Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Updated Jan 13, 2026
Python

diffgram / diffgram

Star

The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.

kubernetes data-science data machine-learning deep-learning image-annotation annotation video-annotation annotations data-analytics labeling datastore datasets annotation-tool data-annotation training-data

Updated Nov 18, 2024
Python

yihong1120 / Construction-Hazard-Detection

Star

Enhances construction site safety using YOLO for object detection, identifying hazards like workers without helmets or safety vests, and proximity to machinery or vehicles. HDBSCAN clusters safety cone coordinates to create monitored zones. Post-processing algorithms improve detection accuracy.

nginx machine-learning computer-vision deep-learning clustering mcp image-processing artificial-intelligence apis object-detection post-processing restful-api hazard-detection hdbscan data-annotation fastapi alert-system fastmcp yolo26

Updated Feb 1, 2026
Python

thepanacealab / SMMT

Star

Social Media Mining Toolkit (SMMT) main repository

tweets annotation twitter-api data-acquisition spacy data-preprocessing gathering data-annotation

Updated Nov 11, 2022
Python

BatsResearch / alfred

Star

A system for prompted weak supervision. Alfred is a powerful tool that leverages large language models to accelerate data annotation.

data weak-supervision annotation-tool vlm data-annotation llm programmatic-weak-supervision prompting

Updated Apr 3, 2025
Python

pixano / pixano

Star

Data-centric AI building blocks for computer vision applications

python machine-learning computer-vision deep-learning data-visualization data-annotation

Updated Mar 27, 2026
Python

saran9991 / llm-data-annotation

Star

Use Large Language Models like OpenAI's GPT-3.5 for data annotation and model enhancement. This framework combines human expertise with LLMs, employs Iterative Active Learning for continuous improvement, and integrates CleanLab (Confident Learning) to ensure high-quality datasets and better model performance

nlp gpt bert active-learning data-annotation fine-tuning dvc confident-learning noisy-labels mlflow cleanlab gpt-4 llm gpt-3-5-turbo

Updated Sep 11, 2023
Python

ufal / factgenie

Star

Lightweight self-hosted span annotation tool

visualization annotation annotations web-interface annotation-tool data-annotation token-classification llm word-classification span-labeling

Updated Mar 12, 2026
Python

hazegreleases / JIETStudio

Star

A free and opensource yolov8, yolo11 and yolo26 all in one training tool that automates file structure and yaml files, auto labeling with SAM2, brush system for uninterupted labeling, a strong modular augmentation system where anybody can write their own filters and training. Without having to open terminal.

training open-source computer-vision image-annotation inference yolo object-detection opencv-python data-annotation labeling-tool dataset-management yolo-gui ultralytics yolov8 yolo11 yolov11 yolo26

Updated Feb 26, 2026
Python

joactr / AnnoTheia

Star

AnnoTheia is a data annotation toolkit that identifies when a person speaks in a scene and transcribes their speech, also offering flexibility to replace modules for different languages.

languages data-annotation fine-tuning active-speaker-detection speech-technologies

Updated Jul 26, 2024
Python

rsgoncalves / text2term

Star

a tool for mapping free-text descriptions of entities to ontology terms

metadata ontology fair metadata-curation data-annotation ontology-mapping

Updated Jun 17, 2025
Python

superannotateai / generated_text_detector

Star

SuperAnnotate HTTP service for Generated Text Detection

nlp detection data-annotation llm generated-text-detection

Updated Dec 17, 2024
Python

minnesotanlp / infoVerse

Star

Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information"

nlp active-learning dpp data-annotation data-centric data-pruning

Updated Jun 28, 2023
Python

NLPForUA / UA-LLM

Star

The entry point for adapting, training, evaluating, and leveraging various Large Language Models (LLMs) for a wide range of Ukrainian NLP tasks.

nlp benchmark natural-language-processing evaluation transformer ukraine question-answering llama gpt language-model mistral natural-language-understanding zero-shot-learning data-annotation ukrainian-language large-language-models llm

Updated Jan 31, 2024
Python

ziliHarvey / smart-annotation-pointrcnn

Star

A PointRCNN version of SAnE, which is a web-based semi-automatic annotation tool for point cloud data.

deep-learning point-cloud webapp data-annotation pointrcnn

Updated Jul 29, 2020
Python

monatis / asr-annotation-bot

Sponsor

Star

Simple Telegram bot to annotate and varify automatic speech recognition datasets

machine-learning telegram-bot automatic-speech-recognition data-annotation

Updated Mar 30, 2021
Python

jangedoo / jupino

Star

Annotate data using Jupyter notebooks

python jupyter annotation-tool data-annotation

Updated Apr 1, 2022
Python

NotShrirang / LoomRAG

Sponsor

Star

🧠 Multimodal Retrieval-Augmented Generation that "weaves" together text and images seamlessly. 🪡

Updated Mar 29, 2025
Python

NLPForUA / ZNO

Star

Structured test tasks and model tuning scripts for multiple subjects from ZNO - the Ukrainian External Independent Evaluation (ЗНО)

nlp benchmark natural-language-processing math history evaluation dataset exam ukraine llama geography language-model gemma reasoning data-annotation ukrainian-language large-language-models ukrainian-language-dataset reasoning-language-models

Updated May 22, 2025
Python

inboxpraveen / Speech-Annotation-Tool

Sponsor

Star

Review, correct, and export ASR transcripts at scale. Web-based ASR accuracy workbench for reviewing, correcting, and exporting speech-to-text transcripts using Whisper, FFmpeg, and Flask.

transformers speech-recognition accuracy automatic-speech-recognition speech-to-text dataset-generation annotation-tool asr data-annotation labeling-tool huggingface huggingface-transformers data-annotation-tools openai-whisper

Updated Dec 24, 2025
Python

Improve this page

Add a description, image, and links to the data-annotation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-annotation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-annotation

Here are 56 public repositories matching this topic...

cleanlab / cleanlab

diffgram / diffgram

yihong1120 / Construction-Hazard-Detection

thepanacealab / SMMT

BatsResearch / alfred

pixano / pixano

saran9991 / llm-data-annotation

ufal / factgenie

hazegreleases / JIETStudio

joactr / AnnoTheia

rsgoncalves / text2term

superannotateai / generated_text_detector

minnesotanlp / infoVerse

NLPForUA / UA-LLM

ziliHarvey / smart-annotation-pointrcnn

monatis / asr-annotation-bot

jangedoo / jupino

NotShrirang / LoomRAG

NLPForUA / ZNO

inboxpraveen / Speech-Annotation-Tool

Improve this page

Add this topic to your repo