Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
davidberenstein1957 
posted an update Jan 30
Post
1631
tldr; Parquet is awesome, DuckDB too!

Datasets on the HF中国镜像站 Hub rely on parquet files. We can interact with these files using DuckDB as a fast in-memory database system. One of DuckDB’s features is vector similarity search which can be used with or without an index.

blog:
https://huggingface.co/learn/cookbook/vector_search_with_hub_as_backend