ohai.social team @ohai

**Hacker News** @h4ckernews@mastodon.social · Apr 8

Hacker News @h4ckernews@mastodon.social

HNSW index for vector embeddings in approx 500 LOC

Heirarchical Navigable Small Worlds. Contribute to dicroce/hnsw development by creating an account on GitHub.

GitHubGitHub - dicroce/hnsw: Heirarchical Navigable Small WorldsHeirarchical Navigable Small Worlds. Contribute to dicroce/hnsw development by creating an account on GitHub.

#HackerNews #HNSW #vector

**Habr** @habr@zhub.link · Mar 31

Mar 31

Habr @habr@zhub.link

Как создать ИИ Телеграм-бот с векторной памятью на Qdrant

Идея создания этого пет-проекта возникла из желания написать собственного ИИ-агента. Я сформулировал для себя минимальные технические требования: агент должен иметь несколько состояний, уметь запускать тулзы и использовать RAG для поиска ответов на вопросы. В итоге возникла идея написать персонального телеграм-ИИ-бота, который умеет запоминать нужную мне информацию, и когда мне надо — я могу его спросить, что он запомнил. Что-то вроде блокнота, только это будет ИИ-блокнот, который умеет отвечать на вопросы. В дополнение я решил добавить в него функцию, чтобы он мог запускать команды на сервере — причём команды, описанные человеческим языком, он будет переводить в команды для терминала. Изначально я думал использовать LangChain. Очень хороший инструмент — позволяет подключать векторные базы данных, использовать различные LLM как для инференса, так и для эмбеддинга, а также описывать логику работы агента через граф состояний. Можно вызывать уже готовые тулзы. В целом, на первый взгляд всё выглядит удобно и просто, особенно когда смотришь типовые и несложные примеры. Но, покопавшись немного глубже, мне показалось, что затраты на изучение этого фреймворка не оправдывают себя. Проще напрямую вызывать LLM, эмбеддинги и Qdrant через REST API. А логику работы агента описать в коде через enum, описывающий состояния, и делать match по этим состояниям. К тому же LangChain изначально написан на Python. Я хотел бы писать на Rust, а использовать Rust-версию LangChain — сомнительное удовольствие, которое обычно упирается в самый неподходящий момент: что-то ещё не было переписано на Rust.

https://habr.com/ru/articles/895914/

ХабрКак создать ИИ Телеграм-бот с векторной памятью на QdrantИдея создания этого пет-проекта возникла из желания написать собственного ИИ-агента. Я сформулировал для себя минимальные технические требования: агент должен иметь несколько...

#rust #ai #qdrant

**MSvana** @msvana@mastodon.social · Mar 28 *

Mar 28 *

MSvana @msvana@mastodon.social

Big update to my Embeddings Playground. I added support for the first free-to-use embedding model: "all-MiniLM-L6-v2" from Sentence transformers (https://www.sbert.net/).

Try the Embeddings playground here: https://embeddings.svana.name

Embeddings Playground user interface showcasing the free model on a simple sentiment analysis class.

#ai #ml #embeddings

**^.^** @thingsifoundinteresting@mastodon.social · Mar 28

Mar 28

^.^ @thingsifoundinteresting@mastodon.social

Nomic Embed Code. #embeddings specifically for code from Nomic.

https://www.nomic.ai/blog/posts/introducing-state-of-the-art-nomic-embed-code

www.nomic.aiNomic Embed Code: A State-of-the-Art Code EmbedderNomic Embed Code is a 7B parameter code embedding model that achieves state-of-the-art performance on CodeSearchNet.

**FIZ ISE Research Group** @fizise@sigmoid.social · Mar 26

Mar 26

FIZ ISE Research Group @fizise@sigmoid.social

We are very happy that our colleage @GenAsefa has contributed the chapter on "Neurosymbolic Methods for Dynamic Knowledge Graphs" for the newly published Handbook on Neurosymbolic AI and Knowledge Graphs together with Mehwish Alam and Pierre-Henri Paris.

Handbook: https://ebooks.iospress.nl/doi/10.3233/FAIA400
our own chapter on arxive: https://arxiv.org/abs/2409.04572

Book Cover of the Handbook on Neurosymbolic AI and knowledge Graphs

#neurosymbolicAI #AI #generativeAI

**Judith van Stegeren** @jd7h@fosstodon.org · Mar 21

Mar 21

Judith van Stegeren @jd7h@fosstodon.org

Should you use OpenAI (or other closed-source) embeddings?

1. Try the lightest embedding model first
2. If it doesn’t work, try a beefier model and do a blind comparison
3. If you are already using a relatively large model, only then try some blind test against a proprietary model. If you really find it that the closed-source model is better for your application, then go for it.

Paraphrased from https://iamnotarobot.substack.com/p/should-you-use-openais-embeddings

I Am Not a Robot · Mar 30, 2023Should you use OpenAI's embeddings? Probably not, and here's why.By Diego Basch

#embeddings #genai #openai

**FIZ ISE Research Group** @fizise@sigmoid.social · Mar 3

Mar 3

FIZ ISE Research Group @fizise@sigmoid.social

Poster from our colleague @epoz from UGent-IMEC Linked Data & Solid course. "Exploding Mittens - Getting to grips with huge SKOS datasets" on semantic embeddings enhanced SPARQL queries for ICONCLASS data.
Congrats for the 'best poster' award ;-)

poster: https://zenodo.org/records/14887544
iconclass on GitHub: https://github.com/iconclass

#rdf2vec #bert #llm #embeddings #iconclass #semanticweb #lod #linkeddata #knowledgegraphs #dh @nfdi4culture @fiz_karlsruhe #iconclass

Very nice poster "Getting the Grips with huge SKOS datasets" - Exploding mittens. Exploding Mittens - Getting to grips with huge SKOS datasets
Creators: Posthumus, Etienne
The ICONCLASS classification system contains more than 1 million SKOS nodes in English, German, French, Italian, Portuguese, Japanese, Dutch, Polish and Czech- in short a non-trivial large corpus of Linked Data. It is possible to load the data into performant triplestores, but becomes challenging in a production setting.
Query performance using conventional SPARQL constructs is unacceptable to end-users, and finding relevant search results for users queries is an informational retrieval challenge due to the redundant yet slightly variant nature of the Art Historical classification data.

We created a virtual Linked Data query engine that integrates a SPARQL API with semantic embeddings using BERT based transformers and RDF2Vec knowledge graph embeddings to provide performant queries in the Dutch Netwerk Digitaal Erfgoed termen netwerk.

**Raphiki** @raphiki@mastodon.social · Jan 24

Jan 24

Raphiki @raphiki@mastodon.social

New Video Alert!
Explore advanced image generation with Stable Diffusion in our latest "GenAI's Lamp" tutorial. Learn how to use #Embeddings and #LoRAs to create stunning visuals.
Watch now!
https://youtu.be/mZ6eVw8-MM8

youtu.be- YouTubeEnjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

#StableDiffusion #ComfyUI #TechAtWorldline

**noterat** @noterat@mastodon.social · Jan 4 *

Jan 4 *

noterat @noterat@mastodon.social

I wrote a post about using #embeddings to map out speeches from the Swedish parliament. Would love to hear your thoughts.

https://noterat.github.io/posts/noteringar/202407301845.html

noterat.github.ioNoterat - Mapping ~400k speeches from the Swedish parlament

**Mark Igra** @markigra@sciences.social · Dec 20, 2024

Dec 20, 2024

Mark Igra @markigra@sciences.social

Is there a consensus process or good paper on state of the art on using #embeddings & #LLM to do the kinds of things that were being done with topic models? I imagine for tasks with pre-defined classifications, prompts are sufficient, but any recommendations for identifying latent classes? After reading the paper below I think I'll want to use local models. #machinelearning https://drive.google.com/file/d/1wNDIkMZfAGoh4Oaojrgll9SPg3eT-YXz/view

Google Docsllreplication.pdf

**michabbb** @michabbb@vivaldi.net · Nov 23, 2024

Nov 23, 2024

michabbb @michabbb@vivaldi.net

Major Update for Vector Search in SQLite

#SQLite-vec v0.1.6 introduces powerful new features:
• Added support for #metadata columns enabling WHERE clause filtering in #KNN queries
• Implemented partition keys for 3x faster selective queries
• New auxiliary columns for efficient unindexed data storage
• Compatible with #embeddings from any provider

Key improvements:
• Store non-vector data like user_id and timestamps
• Filter searches using metadata constraints
• Optimize query performance through smart partitioning
• Enhanced data organization with auxiliary columns

Performance focus:
• Partition keys reduce search space significantly
• Metadata filtering streamlines result selection
• Auxiliary columns minimize JOIN operations
• Binary quantization options for speed optimization

#Database integration:
• Supports boolean, integer, float & text values
• Works with standard SQL queries
• Enables complex search combinations
• Maintains data consistency

Source: https://alexgarcia.xyz/blog/2024/sqlite-vec-metadata-release/index.html

alexgarcia.xyzsqlite-vec now supports metadata columns and filteringMetadata, partition key, and auxiliary column support in sqlite-vec

**marmelab** @marmelab@mastodon.social · Nov 20, 2024

Nov 20, 2024

marmelab @marmelab@mastodon.social

Great read on binary vector embeddings & why they are so impressive.

In short, they can retain 95+% retrieval accuracy with 32x compression and ~25x retrieval speedup.

https://emschwartz.me/binary-vector-embeddings-are-so-cool/

Evan Schwartz
#ai #appreciation #LLM #embeddings #scour #search

Evan SchwartzBinary vector embeddings are so coolVector embeddings by themselves are pretty neat. Binary quantized vector embeddings are extra impressive. In short, they can retain 95+% retrieval accuracy with 32x compression 🤯.

**Alessio Pomaro** @alessiopomaro@mastodon.uno · Nov 19, 2024

Nov 19, 2024

Alessio Pomaro @alessiopomaro@mastodon.uno

Sentiamo sempre più spesso parlare di #embeddings: di cosa si tratta, come si generano, e come possono essere utili nei flussi operativi?
Una spiegazione semplice, con alcuni esempi di utilizzo: https://www.alessiopomaro.it/embeddings-cosa-sono-esempi/.
Facciamo anche alcune importanti riflessioni sull'importanza della consapevolezza di questi sistemi per ottenere performance.

#AI #GenAI #GenerativeAI

**Andrew Easter Wooldridge** @triptych@social.lol · Nov 15, 2024

Nov 15, 2024

Andrew Easter Wooldridge @triptych@social.lol

Embeddings are cool https://technicalwriting.dev/data/embeddings.html #llm #embeddings

technicalwriting.devEmbeddings are underrated

**Alessio Pomaro** @alessiopomaro@mastodon.uno · Nov 14, 2024

Nov 14, 2024

Alessio Pomaro @alessiopomaro@mastodon.uno

Screaming Frog introduce le API per l'interfacciamento con i modelli di #OpenAI, #Google e con #Ollama.
Lavora sull'HTML salvato in fase di scansione, mentre nella versione precedente si usavano snippet JavaScript personalizzati eseguiti durante il rendering delle pagine.
È possibile generare #embeddings e contenuti con prompt personalizzati su contesti selezionabili (attraverso estrattori predefiniti e custom).

La nuova funzionalità di Screening Frog per la connessione alle API dei LLM

La nuova funzionalità di Screaming Frog per la connessione alle API dei LLM

#AI #GenAI #GenerativeAI

**michabbb** @michabbb@vivaldi.net · Nov 10, 2024

Nov 10, 2024

michabbb @michabbb@vivaldi.net

#txtai - All-in-one #embeddings database combining vector indexes, graph networks & relational databases

Key Features:
• Vector search with SQL support, object storage, topic modeling & multimodal indexing for text, documents, audio, images & video
• Built-in #RAG capabilities with citation support & autonomous #AI agents for complex problem-solving
• #LLM orchestration supporting multiple frameworks including #HuggingFace, #OpenAI & AWS Bedrock
• Seamless integration with #Python 3.9+, built on #FastAPI & Sentence Transformers

Technical Highlights:
• Supports multiple programming languages through API bindings (#JavaScript, #Java, #Rust, #Go)
• Easy deployment: run locally or scale with container orchestration
• #opensource under Apache 2.0 license
• Minimal setup: installation via pip or Docker

Use Cases:
• Semantic search applications
• Knowledge base construction
• Multi-model workflows
• Speech-to-speech processing
• Document analysis & summarization

Learn more: https://github.com/neuml/txtai

GitHubGitHub - neuml/txtai: 💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows - neuml/txtai

**Alessio Pomaro** @alessiopomaro@mastodon.uno · Nov 10, 2024

Nov 10, 2024

Alessio Pomaro @alessiopomaro@mastodon.uno

Ieri, al Festival Biblico Tech, la protagonista è stata l'#AI, ma soprattutto la riflessione e lo spirito critico.
Con una grande conduzione di Massimo Cerofolini e Roberta Rocelli, e con compagni di viaggio d'eccezione.
Porto a casa nuovi stimoli, nuovi pensieri, e, da buon nerd, un test da mettere in atto sugli #embeddings e la valutazione dei bias dei #LLM, discusso con Paolo Benanti.

#GenAI #GenerativeAI #IntelligenzaArtificiale

**रञ्जित (Ranjit Mathew)** @rmathew@mastodon.social · Nov 1, 2024

Nov 1, 2024

रञ्जित (Ranjit Mathew) @rmathew@mastodon.social

Wasn’t this…obvious?

“Vector Databases Are The Wrong Abstraction”, Timescale (https://www.timescale.com/blog/vector-databases-are-the-wrong-abstraction/).

Via HN: https://news.ycombinator.com/item?id=41985176

Timescale Blog · Oct 29, 2024Vector Databases Are the Wrong AbstractionToday’s vector databases disconnect embeddings from their source data. We should treat embeddings more like database indexes—here’s how.

#MachineLearning #Databases #Embeddings

**Talk to Me About Tech** @talktomeabouttech@hachyderm.io · Oct 29, 2024

Oct 29, 2024

Talk to Me About Tech @talktomeabouttech@hachyderm.io

Brand new #OpenSource tool for #PostgreSQL - pgai Vectorizer - just launched today from #TimescaleDB! Manage #embeddings with just one #SQL command to keep embeddings in sync with your data in a far easier fashion.

Learn more on the #GitHub repository here: https://github.com/timescale/pgai/blob/main/docs/vectorizer.md

GitHubpgai/docs/vectorizer.md at main · timescale/pgaiA suite of tools to develop RAG, semantic search, and other AI applications more easily with PostgreSQL - timescale/pgai

#postgres #timescale #pgai

**Alessio Pomaro** @alessiopomaro@mastodon.uno · Oct 29, 2024

Oct 29, 2024

Alessio Pomaro @alessiopomaro@mastodon.uno

Domani, all'Advanced SEO Tool vedremo una pillola tecnica dal titolo "Embeddings e SEO: è QUASI magia".
È possibile rimuovere quel "QUASI"? Secondo me sì.. con la consapevolezza di questi strumenti, che proveremo ad acquisire.
Vedremo esempi pratici di utilizzo, test e considerazioni.
Per poi scoprire che non si tratta di "magia"!

#advSEOTool #AI #SEO

Recent searches

Search options

Administered by:

Server stats:

#embeddings