Key Takeaways from Tribe AI’s LLM Hackathon

Ben Kinsella

The Tribe AI community recently organized its first-ever LLM Hackathon, bringing together over 60 registered participants, along with an incredible lineup of sponsors and judges. Our objective was clear: as a collective of AI technologists, researchers, and product leaders, we aimed to discover novel applications and infrastructure for LLMs in a collaborative and supportive environment. 

As cutting-edge advances push the boundaries of what LLMs can do, it felt crucial for us to stay abreast of these developments through hands-on experience. We focused on the most pressing questions and hottest trends in AI:

  • What are the emergent behaviors of LLMs? 
  • To what extent do LLMs still struggle with negation, and how can we prevent them from generating unreliable or fabricated information?
  • In what situations can smaller, custom-trained models outperform larger, general-purpose language models? What strategies can we use to parallelize tasks and improve response times?
  • How can user interfaces and tools improve the steerability and customization of chatbot experiences?
  • What new approaches or techniques are required for effective asynchronous scientific computing?
  • What does it take to train and run one’s own LLM, and how do the high memory requirements and GPU dependency of these models pose challenges? How can we mitigate or work around these challenges?

The answers to these questions were critical for navigating the LLM Hackathon successfully.

What did we learn? 

During the week-long event, participants attended informal sessions to share learnings, challenges, obstacles, and feedback. Here  are a few key takeaways that emerged during the hackathon. 

Theme #1: Performance and Efficiency of LLMs

Key takeaway: Smaller, tailored models proved more effective despite being significantly smaller than LLMs
  • LLMs tend to hallucinate - which is fine if you want to draw an image of a cowboy panda on the moon. However, what is the impact of hallucination on more precision-dependent tasks?
  • For instance, if you’re trying to calculate how much revenue was generated in Q3 of 2023 (like the project Soleda did), an exact number is crucial. Surprisingly, we discovered training a smaller model on high quality data is able to outperform an LLM that is 100x larger. 

Theme #2: Latency as a Technical Constraint

Key Takeaway: Managing latency in AI applications is paramount to user experience that requires innovative solutions such as task parallelization and streaming outputs to enhance responsiveness and maintain user engagement.
  • Latency emerged as another primary challenge for many teams. APIs like Cohere/GPT often took seconds - or even minutes - to respond when embedding a large dataframe (e.g., Arxiv papers). Additionally, post-processing tasks such as creating embeddings and filtering results could take more than 30 seconds or even a few minutes. 
  • Question and answering tasks involve N+1 API calls, with the last call requiring a large context. This can result in significant delays, especially when considering multiple sources. Trade-offs exist between answer completeness and response time.
  • Considering users are accustomed to instant responses from Google, even  minimal latency can impact the user experience. This phenomenon may also suggest why ChatGPT adds streaming output (e.g., as if someone were writing) to make it feel less slow.
  • Some teams found solutions to improve speed by parallelizing tasks, such as calling the GPT API or downloading PDFs. 

Theme #3: Memory Requirements

Key takeaway: Implementing advanced LLMs (e.g., LLaMa) can pose significant challenges due to the high computational resources required, necessitating careful planning and resource allocation
  • Several teams intended to use LLaMa embeddings, Facebook's LLM. However, LLaMa weights were not available within the timeframe of the hackathon. 
  • Using LlaMa for Hackathon projects presented challenges due to its high memory requirements, demanding substantial RAM memory and 30GB of GPU memory, which was infeasible to overcome in just a few days. 

Theme #4: User Experience and Interface Design

Key takeaway: Enhancing user steerability in chatbot experiences is crucial, but not always easy to be achieved. To approach this problem, dynamic query enrichment and innovative solutions allow for personalized and contextually aware interactions.
  • Enhancing user steerability in chatbot experiences is essential because it directly influences user engagement, satisfaction, and the overall effectiveness of the AI system. With improved steerability, users can have more natural, contextually aware interactions that feel personalized to their needs.
  • From a user's perspective, a chatbot that understands their intent and adapts its responses accordingly provides a far more satisfying and efficient interaction than one that follows a rigid, predetermined script. It can lead to better problem resolution, faster access to information, and a more engaging, human-like conversational experience.
  • One team, “Distributed Wikipedia Expert” led by Tommaso Furlanello, developed two tools to enhance user steerability. 
  • The first tool allowed dynamic query enrichment by incorporating user-filtered results from vector search. The user interface allows to dynamically enrich queries to the language model with user-filtered results from vector search, with a focus over Cohere open sourced Wikipedia embeddings and source code of installed pip packages parsed with LibCST
  • The second involved building a pipeline, which the team called, Automatic Perspective Prompting. This multi-step pipeline searches Wikipedia for topic-intersecting information and distills it into a system prompt to steer the chatbot personality towards that perspective. The team produced 400 such perspectives that we open sourced

Theme #5: Complexities and Innovations in Pipeline Creation

Key takeaway: Creating complex pipelines with multiple models requires innovative approaches
  • Building a complex pipeline with hundreds of model calls, each with unpredictable API call completion times, required new forms of multi-threaded asynchronous scientific computing. 
  • This approach allows for the execution of multiple tasks in an overlapping time period, which is crucial when dealing with hundreds of models with varied completion times.
  • Moreover, this scientific computing needed to align with a non-deterministic clock time. This suggests the need for flexibility in pipeline design. Real-world situations often present unpredictable scenarios, so pipelines need to adapt to different time requirements and conditions, rather than adhering to a strictly deterministic schedule.

Theme #6: Challenges with LLM Usage and Control

Key takeaway: The ​​effective use and control of LLMs in context-specific scenarios (e.g., IF logic), present significant challenges that can be improved through innovative prompt design and frameworks
  • One of the biggest challenges for some teams had to do with LLM usage and control through prompt in-context learning. 
  • For example, it was very difficult to include “IF logic” in the prompt to have the model respond differently in various thought process flows. For example, deciding to use a tool or not based on whether the mode’s assessment of its knowledge for a given question was a challenge. However, one team managed to implement IF logic into the prompts using a ReACT type of framework.
  • Negation - a common issue in language models - posed another obstacle. LLMs struggle to handle negation, in addition to not being able to deny (e.g., “I cannot answer that”, “I do not know”). One team managed to design a prompt that forces the LLM to argue (see here for a relevant article on the “negation problem” with LLMs).

Who were the winners?

A total of ten groups presented the results after a rigorous week of hacking. Each presentation was evaluated and scored by three expert judges - Paige Fox, Adrian Bauer, and Olekansdr Paraska - who selected the three  winners. 

First Place: “LLMOps Stack on Kubernetes” 

The first place winners deployed an end-to-end pipeline for data generation, fine-tuning, and deployment of an LLM using a personal finance use case. The team trained their model with RLHF using “r/personalfinance” data, as well as integrating their UI to capture user feedback so that it could be looped to improve the model in the future. 

Team: Rahul Parundekar, Daniel Vainsencher, PhD, Shri Javadekar,, Yulin J. Kuo, Yada Pruksachatkun, and Bryan Davis.

Second place: “AI-Research Assistant”

The second place team built a chat-based researcher. The tool enables researchers to help users quickly explore relevant topics and retrieve answers to their questions from academic papers on Arxiv using LLMs. 

Team: Kiyo (Kiyohito) Kunii, Yifan Yuan, and Hector Lopez Hernandez, PhD

Third place: “Soleda AI”

Third place went to Soleda, an AI-powered analytics agent trained on proprietary marketing conversations to perform NLU on user’s requests to generate business insights. The product showed tremendous value to growth marketing teams who must quickly respond to dynamic market conditions and adapt their strategies based on evolving analytics needs, equipping stakeholders with the tools to make data-driven decisions. 

Team: Derek Chen

Acknowledgments

We would like to express our sincere gratitude to all the participants who contributed to the success of this LLM hackathon. Your efforts, innovative ideas, and enthusiasm to learn and collaborate played a pivotal role in making this event possible. Your contributions have not only elevated the hackathon itself, but have also expanded our collective knowledge and advanced innovation in this field. 

We would also like to extend our thanks to our sponsors, Cohere and Modal Labs, for their generous support in providing credits and technical assistance to the participants throughout the event. Additionally, we are grateful to Richard Abrich from OpenAdapt.ai and his company for sponsoring some of the cash prizes.

Lastly, we extend our gratitude to our three judges - Paige Fox, Adrian Bauer, and Olekansdr Paraska - for taking the time to provide feedback to projects and enriching the Tribe AI community. Thank you!

Related Stories

Applied AI

Current State of Enterprise AI Adoption, A Tale of Two Cities

Applied AI

AI Implementation in Healthcare: How to Keep Data Secure and Stay Compliant

Applied AI

AI-Driven Digital Transformation

Applied AI

5 machine learning engineers predict the future of self-driving

Applied AI

Segmenting Anything with Segment Anything and FiftyOne

Applied AI

AI in Banking and Finance: Is It Worth The Risk? (TL;DR: Yes.)

Applied AI

AI in Construction: How to Optimize Project Management and Reducing Costs

Applied AI

AI and Blockchain Integration: How They Work Together

Applied AI

Key Generative AI Use Cases From 10 Industries

Applied AI

Advanced AI Analytics: Strategies, Types and Best Practices

Applied AI

Using data to drive private equity with Drew Conway

Applied AI

Understanding MLOps: Key Components, Benefits, and Risks

Applied AI

A Guide to AI in Insurance: Use Cases, Examples, and Statistics

Applied AI

AI Consulting in Healthcare: The Complete Guide

Applied AI

Generative AI: Powering Business Growth across 7 Key Operations

Applied AI

What the OpenAI Drama Taught us About Enterprise AI

Applied AI

AI Consulting in Insurance Industry: Key Considerations for 2024 and Beyond

Applied AI

Self-Hosting Llama 3.1 405B (FP8): Bringing Superintelligence In-House

Applied AI

How to Evaluate Generative AI Opportunities – A Framework for VCs

Applied AI

The Secret to Successful Enterprise RAG Solutions

Applied AI

How the U.S. can accelerate AI adoption: Tribe AI + U.S. Department of State

Applied AI

AI Diagnostics in Healthcare: How Artificial Intelligence Streamlines Patient Care

Applied AI

How to Reduce Costs and Maximize Efficiency With AI in Finance

Applied AI

AI for Cybersecurity: How Online Safety is Enhanced by Artificial Intelligence

Applied AI

Machine Learning in Healthcare: 7 real-world use cases

Applied AI

AI and Predictive Analytics in the Cryptocurrency Market

Applied AI

Tribe welcomes data science legend Drew Conway as first advisor 🎉

Applied AI

Thoughts from AWS re:Invent

Applied AI

How to Measure and Present ROI from AI Initiatives

Applied AI

How to Build a Data-Driven Culture With AI in 6 Steps

Applied AI

A Gentle Introduction to Structured Generation with Anthropic API

Applied AI

7 Prerequisites for AI Tranformation in Healthcare Industry

Applied AI

AI Security: How to Use AI to Ensure Data Privacy in Finance Sector

Applied AI

AI in Finance: Common Challenges and How to Solve Them

Applied AI

How AI for Fraud Detection in Finance Bolsters Trust in Fintech Products

Applied AI

Everything you need to know about generative AI

Applied AI

How data science drives value for private equity from deal sourcing to post-investment data assets

Applied AI

Common Challenges of Applying AI in Insurance and Solutions

Applied AI

AI Implementation: Ultimate Guide for Any Industry

Applied AI

A primer on generative models for music production

Applied AI

AI Consulting in Finance: Benefits, Types, and What to Consider

Applied AI

Navigating the Generative AI Landscape: Opportunities and Challenges for Investors

Applied AI

A Deep Dive Into Machine Learning Consulting: Case Studies and FAQs

Applied AI

AI in Private Equity: A Guide to Smarter Investing

Applied AI

8 Prerequisites for AI Transformation in Insurance Industry

Applied AI

How to Enhance Data Privacy with AI

Applied AI

Scalability in AI Projects: Strategies, Types & Challenges

Applied AI

Write Smarter, Not Harder: AI-Powered Prompts for Every Product Manager

Applied AI

Leveraging Data Science – From Fintech to TradFi with Christine Hurtubise

Applied AI

AI in Construction in 2023: Use Cases and Benefits

Applied AI

Making the moonshot real – what we can learn from a CTO using ML to transform drug discovery

Applied AI

How AI Enhances Real-Time Credit Risk Assessment in Lending

Applied AI

Why do businesses fail at machine learning?

Applied AI

How to build a highly effective data science program

Applied AI

How AI is Cutting Healthcare Costs and Streamlining Operations

Applied AI

3 things we learned building Tribe and why project-based work will change AI

Applied AI

10 ways to succeed at ML according to the data superstars

Applied AI

Top 9 Criteria for Evaluating AI Talent

Applied AI

Welcome to Tribe House New York 👋

Applied AI

AI in Portfolio Management

Applied AI

How to Seamlessly Integrate AI in Existing Finance Systems

Applied AI

The Hitchhiker’s Guide to Generative AI for Proteins

Applied AI

How to Measure ROI on AI Investments

Applied AI

An Actionable Guide to Conversational AI for Customer Service

Applied AI

Top 5 AI Solutions for the Construction Industry

Applied AI

Tribe's First Fundraise

Applied AI

What our community of 200+ ML engineers and data scientist is reading now

Applied AI

How to Reduce Costs and Maximize Efficiency With AI in Insurance

Applied AI

AI in Customer Relationship Management

Applied AI

How to Optimize Supply Chains with AI

Applied AI

How AI Enhances Hospital Resource Management and Reduces Operational Costs

Applied AI

No labels are all you need – how to build NLP models using little to no annotated data

Applied AI

Announcing Tribe AI’s new CRO!

Applied AI

8 Ways AI for Healthcare Is Revolutionizing the Industry

Applied AI

Best Practices for Integrating AI in Healthcare Without Disrupting Workflows

Applied AI

How 3 Companies Automated Manual Processes Using NLP

Applied AI

AI and Predictive Analytics in Investment

Applied AI

10 Expert Tips to Improve Patient Care with AI

Get started with Tribe

Companies

Find the right AI experts for you

Talent

Join the top AI talent network

Close
Product @ Tribe
Ben Kinsella
Benjamin Kinsella works at the intersection between product, ML, and operations, partnering with the world’s top technologists to drive impact using ML and AI. With a background in linguistics, he is very excited about the potential for LLMs and its outsized role in domains, like education. Outside of his work at Tribe AI, Benjamin also lectures for Columbia University.