Vectara Unveils Open-Source Hallucination Evaluation Model To Detect and Quantify Hallucinations in Top Large Language Models

SANTA CLARA, Calif., Nov. 06, 2023 (GLOBE NEWSWIRE) -- Large Language Model (LLM) builder Vectara, the trusted Generative AI (GenAI) platform, released its open-source Hallucination Evaluation Model. This is a first-of-its-kind initiative to proffer a commercially available and open-source model that addresses the accuracy and level of hallucination in LLMs, paired with a publicly available and regularly updated leaderboard, while inviting other model builders like OpenAI, Cohere, Google, and Anthropic to participate in defining an open and free industry-standard in support of self-governance and responsible AI.

By launching its Hallucination Evaluation Model, Vectara is increasing transparency and objectively quantifying hallucination risks in leading GenAI tools, a critical step toward removing barriers to enterprise adoption, stemming dangers like misinformation, and enacting effective regulation. The model is designed to quantify how much an LLM strays from facts while synthesizing a summary related to previously provided reference materials.

'In order to realize the true promise of Generative AI, we first have to tackle the challenge of hallucinations,” said Matei Zaharia, CTO and Co-Founder of Databricks. “The launch of the Hallucination Evaluation Model to the Hugging Face community encourages industry co-innovation and accountability through a powerful measurement tool accessible for all LLM builders.'

The Hallucination Evaluation Model launch includes releasing Vectara’s measurement code base as an open-source model on Hugging Face as well as a publicly accessible Leaderboard available from Vectara. The Leaderboard serves as a quality metric for LLM factual accuracy, similar to how credit ratings or FICO scores function for financial risk, giving businesses and developers insight into the realities of different GenAI tools before implementing them.

“For organizations to effectively implement Generative AI solutions including chatbots, they need a clear view of the risks and potential downsides,' said Simon Hughes, AI researcher and ML engineer at Vectara. 'For the first time, Vectara’s Hallucination Evaluation Model allows anyone to measure hallucinations produced by different LLMs. As a part of Vectara’s commitment to industry transparency, we’re releasing this model as open source, with a publicly accessible Leaderboard, so that anyone can contribute to this important conversation.”

Key Features of Vectara’s Hallucination Evaluation Model:

Objective Measurement: This model provides much-needed visibility into the LLMs' ability to synthesize data without introducing hallucinations. Many LLM vendors make claims about their capabilities to mitigate the impact of hallucinations, but until now, there have been no objectively verifiable methods for detecting and quantifying instances of irrelevant or incorrect data in model outputs. For the model, Vecatara built a machine-learning model, tuned for real world performance and using the latest advancements in hallucination research, to evaluate LLM summarizations without requiring objective scoring or influence.

Transparency Through Open Source: The Hallucination Evaluation Model is available for developers and industry stakeholders to integrate into their own pipelines through an Apache 2.0 License on Hugging Face. Developers can also use the open-source evaluation model to verify the accuracy of Vectara’s platform.

Dynamic Leaderboard: Vectara’s AI researchers and ML engineers (in collaboration with the open source community) will maintain and continually update the Leaderboard, showcasing the hallucination impact of different LLMs and offering a clear comparative perspective as new models emerge. The Leaderboard lists the accuracy and hallucination rates for each model tested in response to the same set of prompts.

The Leaderboard shows that OpenAI’s models have the strongest performance, followed by the Llama 2 models, Cohere and Anthropic. Google’s Palm models scored lower on the Leaderboard.

“Hallucination is one of the most serious issues to consider when deploying production LLMs. Having an open source benchmark model that can evaluate factual accuracy in a quantifiable way will allow developers to directly address the problems,” said Waleed Kadous, Chief Scientist at Anyscale. “Vectara’s new model sets the industry standard for measuring the extent to which LLMs hallucinate, and we’re excited to work with them as a launch partner.”

Vectara has led industry efforts to address hallucinations as a critical barrier to the safe, effective, and accurate use of GenAI. The model doesn’t solve hallucinations directly but rather enables more informed adoption and better decision-making by measuring the frequency and severity of this phenomena. Greater transparency into the quality of LLM-produced summarizations allows LLM users to evaluate GenAI solutions according to the risk profile of the intended use case.

GenAI adoption in highly regulated industries like legal, healthcare, finance, energy, and government will hinge upon vendors' ability to provide solutions with low to nearly zero risk of factual inaccuracies. Hallucinations have already been raised by stakeholders in these sectors as a serious issue. Until now, however, there has been no way to objectively compare the performance of available models outside of academic benchmarks, which don’t always translate to real-world settings.

Hallucinations also factor heavily in ongoing dialogue about GenAI regulation. Effective government oversight requires measurement tools universally recognized as transparent and objective. Vectara’s open-source model serves as an industry standard, providing the missing link to legislation that virtually all industry leaders agree is needed. With concerns around misinformation and other AI risks rising ahead of the U.S. presidential election and other geopolitical events, the Hallucination Evaluation Model and Leaderboard provide a tangible step toward data-driven and accessible oversight mechanisms.

About Vectara
Vectara is an end-to-end platform that empowers product builders to embed powerful Generative AI features into their applications with extraordinary results. Built on a solid hybrid-search core, Vectara delivers the shortest path to an answer or action through a safe, secure, and trusted entry point. Vectara is built for product managers and developers with an easily leveraged API that gives full access to the platform's powerful features. Vectara’s Retrieval Augmented (Grounded) Generation allows businesses to quickly, safely, and affordably integrate best-in-class conversational AI and question-answering into their application with zero-shot precision. Vectara never trains their models on customer data, allowing businesses to embed generative AI capabilities without the risk of data or privacy violations. To learn more about Vectara, visit www.vectara.com.

Media Contact
Carly Bourne
carly@bulleitgroup.com
423-443-0449

Vectara Unveils Open-Source Hallucination Evaluation Model To Detect and Quantify Hallucinations in Top Large Language Models

Cập nhật ngày:
6-11-2023, 17:00

TIN LIÊN QUAN

Grounded Generation from Vectara Defines a New Gold Standard for Generative AI Use for Business Data

Cohere Announces $270M Series C to Bring Generative AI to Enterprises

GyanAI Launches the World’s First Explainable Language Model and Research Engine

Mattermost Introduces “OpenOps” to Speed Responsible Evaluation of Generative AI Applied to Workflows

New generative AI-powered Zoom IQ features are now available to Zoom users via free trials

Zoom Partners with Anthropic to Expand Federated Approach to AI

LocalStack Integrates with LambdaTest to Achieve Accelerated Test Execution Speed

Freshworks Unveils New Generative AI Enhancements Across Product Lines to Power Greater Business Efficiency

[img]http://phpstack-911252-3168991.cloudwaysapps.com/api/ImageRender/DownloadFile?resourceId=70f65da2-de89-4f8e-9b4e-ae2e5c2b78df&size=3[/img]

THỦ THUẬT HAY

Thủ thuật

Hướng dẫn kích hoạt bảo mật 2 lớp cho Apple ID

Xác thực 2 lớp là tầng bảo mật bổ sung được thiết kế để ngăn chặn truy cập trái phép vào tài khoản của bạn, đồng thời bảo vệ ảnh, tài liệu và các dữ liệu khác được lưu với Apple.

Thủ thuật

Hưỡng dẫn cài đặt trình duyệt Internet trên Samsung chạy tốt nhất

Trong hằng hà sa số những ứng dụng trình duyệt hiện nay, Samsung Internet có một vài tính năng nổi bật như sử dụng dễ dàng bằng một tay, chặn quảng cáo bằng Extensions...

Thủ thuật

Hãy thử ngay những cách khắc phục lỗi CH Play bị dừng hoặc buộc đóng

Bạn đang tải game, ứng dụng hoặc tiện ích từ CH Play về điện thoại, bỗng dưng gặp lỗi cho biết CH Play bị treo, bị dừng, chờ tải xuống … Trong bài này, sẽ hướng dẫn cách giải quyết vấn đề này với một trong các giải

Thủ thuật / Thế giới Games

Hướng dẫn làm vô lăng chơi Alpha 8 và các game đua xe khác bằng bìa carton

Nếu bạn là fan của Alpha 8 hay Alpha Nitro thì chắc chắn chơi game với vô lăng sẽ mang đến những trải nghiệm rất 'thực tế' và thú vị. Mới đây, kênh Youtube The Q đã có video hướng dẫn cách làm tay cầm để chơi những tựa

Tiện ích Internet

Giải phóng bộ nhớ Android hiệu quả bằng cách nén ứng dụng

Chúng ta thường chiếm đa phần bộ nhớ khi tải quá nhiều ứng dụng, hình ảnh và khi cần đến lại không đủ. Với cách sau sẽ giúp các bạn tiết kiệm dung...

ĐÁNH GIÁ NHANH

Điện thoại / Đánh giá

Đánh giá chi tiết camera Samsung Galaxy Note8: Có chụp đẹp nhất thế giới?

Thiết bị số / Đánh giá

Trên tay nhanh chiếc JBL Flip 3, tặng kèm khi mua Galaxy Note 8 Orchid Gray

Chiếc loa JBL Flip 3 có khá nhiều phiên bản màu sắc khác nhau, và mình khá may mắn khi trên tay được phiên bản màu 'rằn ri' độc và lạ này.

Điện thoại / Đánh giá

Camera LG G7 ThinQ: Việc thiết lập góc rộng là chưa đủ thú vị

Máy ảnh kép của LG sử dụng hai cảm biến giống hệt nhau, Sony IMX351 có độ phân giải 16 megapixel. Các cảm biến nhỏ hơn là tương tự như cảm biến được sử dụng trên các điện thoại thông minh cao cấp khác. Khẩu độ là f /