NEWTrain a custom GPT Chatbot on YouTube videosTry Now

[AINews] OpenAI's Instruction Hierarchy for the LLM OS • ButtondownTwitterTwitter

buttondown.email

Updated on April 25 2024

Chapters

AI Discord Recap
CUDA MODE Discord
Discord Channels Insights
BotPress and Cohere AMA
Unsloth AI Updates and Discussions
Perplexity AI and Perplexity CEO Updates
Syntax Tree Based Code Chunking
Mixed Graphics Setup and Issues with ROCm
CUDA Mode
HuggingFace Highlights
Community Discussions
Interconnects Memes Channel and Mini Models
Engineering and AI Discussions
RAG Chatbot Expansion Ideas
LLM Perf Enthusiasts AI

AI Discord Recap

New AI Model Releases and Benchmarking
- Llama 3 was released, trained on 15 trillion tokens and fine-tuned on 10 million human-labeled samples. The 70B version surpassed open LLMs on MMLU benchmark, scoring over 80. It features SFT, PPO, DPO alignments, and a Tiktoken-based tokenizer. [demo]
- Microsoft released Phi-3 mini (3.8B) and 128k versions, trained on 3.3T tokens with SFT & DPO. It matches Llama 3 8B on tasks like RAG and routing based on LlamaIndex's benchmark. [run locally]
- Internist.ai 7b, a medical LLM, outperformed GPT-3.5 and surpassed the USMLE pass score when blindly evaluated by 10 doctors, highlighting importance of data curation and physician-in-the-loop training.
- Anticipation builds for new GPT and Google Gemini releases expected around April 29-30, per tweets from @DingBannu and @testingcatalog.
Efficient Inference and Quantization Techniques
- Fire Attention presents a method to serve open-source models 4x faster than V

CUDA MODE Discord

Lightning AI users have faced a complex verification process, leading to recommendations to contact support or tweet for expedited service. CUDA developers discuss synchronization, memory coalescing, and optimization strategies. PyTorch operations remain on GPU, while debates revolve around Tensor Core performance. Conversations highlight CUDA's educational potential and technological challenges, like conflicts with AMD and NVIDIA graphics setups. The community explores GPU offload, hardware conundrums, and potential improvements in CUDA teaching methods.

Discord Channels Insights

This section provides insights into various Discord channels focused on different AI and machine learning topics. From innovative model releases like Llama 3 to discussions on advancements in OCR technology and the development of conversational AI agents, these channels offer a platform for AI enthusiasts to engage in deep conversations, share updates, and address challenges in the field. The section covers a range of topics including improvements in text-to-speech technology, challenges in GPU resource allocation, debates on the concept of Artificial General Intelligence (AGI), and updates on projects like Open Interpreter and Hydra for enhanced AI development and configuration management.

BotPress and Cohere AMA

The discussion in this section covers the exploration of using Cohere Command-r with RAG in BotPress, surpassing ChatGPT 3.5, and an AI Agent concept for Dubai Investment and Tourism interacting with Google Maps and www.visitdubai.com. Cohere has made its Coral app open-source, encouraging developers to add custom data sources and deploy applications to the cloud. Additionally, insights are provided on the value of technical skills over networking in AI career development, along with guidance on advancing skills in machine learning and LLMs by emphasizing problem-solving and real-world inspiration. Discussions about Cohere CLI tips, LLM Game enhancements, and ongoing community interactions on releases from OpenAI and Google showcase the dynamic landscape of AI application and innovation.

Unsloth AI Updates and Discussions

###Fine-tuning Challenges

Users reported issues with fine-tuned Llama-3 models producing gibberish, despite working well during training.

###Unsloth Support Clarification

Clarification that Unsloth's open-source version supports continuous pre-training but not full training.

###Model Training Precision

Discussion on models training with 4-bit precision and the ability to export in higher precision.

###Training Speed Configuration

Queries on unusual completion speeds for Llama3-instruct:7b and seeking input on training speed for specific GPUs.

###Unsloth Pro and Multi-GPU Support

Plans for multi-GPU support around May and developing a platform for Unsloth Pro distribution.

Perplexity AI and Perplexity CEO Updates

Perplexity AI, a prominent AI search engine startup, has made headlines with a significant funding round of at least $250 million, eyeing a valuation of up to $3 billion. The company's CEO, Aravind Srinivas, has discussed this new funding and the upcoming launch of its enterprise tool in an exclusive interview with CNBC amidst competition from tech giants like Google. Users have been actively engaging with Perplexity AI's search functions and AI capabilities, exploring features like image description and language translation tools. However, some users have reported visibility issues and limitations within the Perplexity API, such as the inability to upload images. The community has also discussed the potential of FSDP/DORA for fine-tuning large models and debated the Phi-3 Mini's performance against other models like llama3 and GPT-3.5. Snowflake has introduced a new 408B parameter model designed to outperform its contemporaries, sparking conversations about its innovative architecture and specialized dataset.

Syntax Tree Based Code Chunking

An alpha package for converting Venv into Datasets through syntax tree based chunking is discussed, with a focus on breaking down folders recursively into modules, classes, and methods while keeping track of nodes. This work is accessible on GitHub at HK3-Lab-Team/PredCST. Model Grounding Challenges with Auto-Generated Reference Data: The conversation highlights problems with a model referencing code debug data, resulting in hallucinations when faced with new code. The discussion suggests that relative positioning may be more effective than exact integers for chunking and referencing. Refining Validation Practices in Models: A deep dive into the use of Pydantic models for validation reveals that recent updates promote more sophisticated, faster, and more expressive tools in the latest release, advocating for a shift from traditional approaches to functional validators. Citation Referencing with Line Number Tokens: The chat explores the idea of using special sequential line number tokens to aid model referencing in citation, though it acknowledges complications with code syntax integrity and potential oversimplification of the model's attention mechanism. Ensuring Output Format Conformity: A discussion on constraining model output format reveals that maintaining order can produce better performance, even for semantically equivalent outputs. Constraints may be implemented through schema order enforcement or regex matching, as seen in projects like lm-format-enforcer on GitHub.

Mixed Graphics Setup and Issues with ROCm

Users with mixed AMD and NVIDIA setups are facing challenges when installing ROCm for LM Studio, requiring removal of NVIDIA drivers and hardware. Teething issues in the tech preview are acknowledged, with hopes for a more robust solution in the future. Some users are disappointed by the lack of RDNA 1 architecture support. Reports of sporadic functionality in loading models in LM Studio Rocm suggest compatibility issues. Incompatibility with the RX 5700 XT card on Windows due to lack of support in the ROCm HIP SDK poses a challenge for users.

CUDA Mode

GitHub

Second matmul for fully custom attention by ngc92 · Pull Request #227 · karpathy/llm.c: Considerable speed-up in benchmarks observed. Working on the main script for further improvements.
LLM9a: CPU optimization: no description found
Courses: no description found

Massively Parallel Crew (4 messages)

Guest Speaker Invite Considered: Invite extended to @tri_dao on Twitter to discuss kernel code and optimizations.
Clarification of Presentation Content: Member clarified @tri_dao can present on any preferred topic, with interest in flash decoding.

Eleuther

New Open Source Generative Image Model Arena Launched: ImgSys project introduced, showcasing generative image model arena at imgsys.org.
Chain-of-Thought Prompting Leaderboard Unveiled by Hugging Face: Unveiling of Open CoT Leaderboard emphasizing reasoning capabilities in model solutions.
Assessment of CoT Approaches in Recent Research: Strong focus on CoT prompting techniques and their applications, with some disappointment on dataset focus.
Mention of Counterfactual Reasoning: Brief mention of interest in counterfactual reasoning.
Reasoning Research as a High Priority Area: Consensus on reasoning importance in AI research.

Links mentioned:

Introducing the Open Chain of Thought Leaderboard: no description found
imgsys.org | an image model arena by fal.ai: no description found
fal-ai/imgsys-results · Datasets at Hugging Face: no description found

Research (189 messages🔥🔥)

Decoding LLMs - Task-Dependent Performance: Paper on decoding methods improving task performance without labels or demonstrations.
Efficient Diffusion Models with Align Your Steps: NVIDIA's Align Your Steps improves sampling speed for Diffusion Models.
Facebook's 1.5 trillion parameter Recommender System: HSTU architecture deployed with significant improvements.
Economic Approach to Generative AI Copyright Issues: Addressing copyright concerns with generative AI using cooperative game theory.
Challenges of Privacy with Generative AI: Research highlighting privacy vulnerabilities with generative AI.

Links mentioned:

SpaceByte: Towards Deleting Tokenization from Large Language Modeling: Tokenization challenges in large language models.
Extracting Training Data from ChatGPT: no description found
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels: Enhancing language model performance.
An Economic Solution to Copyright Challenges of Generative AI: Addressing copyright concerns in AI systems.
MambaByte: Token-free Selective State Space Model: Token-free language model for direct learning from raw bytes.
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models: DMs sampling speed optimization.
Align Your Steps: no description found
Profluent: Protein design language.
Simple linear attention language models balance the recall-throughput tradeoff: Linear attention model efficiency tradeoffs.
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations: Large-scale recommendation systems advancement.
Zoology (Blogpost 2): Simple, Input-Dependent, and Sub-Quadratic Sequence Mixers: no description found
A Thorough Examination of Decoding Methods in the Era of LLMs: Decoding methods evaluation.
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry: Linear attentions for Transformer efficiency.

Computational Limits (50 messages🔥)

Advancing RWKV Integration in NeoX: Integration of RWKV into GPT-NeoX. Updates and improvements, with a need for JIT compilation and other support.
Update on RWKV's Versioning and Portability: Discussion on version numbering for RWKV and potential AMD support with Triton kernels.
Tokenization Troubles in Pretraining Data: Issues with tokenizer versions impacting space token splitting.
Tokenization And Version Management Frustrations: Frustrations with managing tokenization inconsistencies in NeoX.
Tackling Complexities of Token Merging: Discussion on handling token merging discrepancies.

Link mentioned:

GitHub - RWKV/RWKV-infctx-trainer at rwkv-6-support: RWKV training for arbitrary context sizes.

HuggingFace Highlights

XL models are recommended with success reported for models like RealVis V4.0. Users compare Forge UI and A1111, discussing Forge's memory efficiency. Community suggests merging models with Lora training. Excitement surrounds Stable Diffusion 3.0 release. Recommendations for improving Stable Diffusion outputs include higher resolutions and fine-tuning tactics. In another section, highlights include the launch of Llama 3 and Phi-3 models, new AI content, and HuggingChat availability on iOS. Further discussions cover OCR tools, HuggingChat API, model training tactics, and Stable Diffusion setup struggles. Rust integration with Candle framework, LangChain service, knowledge transfer, and ONNX model conversion are also topics of discussion in various channels. Members also share collaborative opportunities, new model releases, project milestones, and learning experiences within the community.

Community Discussions

The community discussions cover various topics related to Mojo, including sorting algorithms, nightly vs stable versions differences, special functions for pointer initialization, and the challenges encountered in tuning Phi-3 models. Additionally, new projects like MoCodes and performance updates related to Max and random number generation are shared. Beyond Mojo, there are discussions on fictional characters, model advancements like Internist.ai 7b, and challenges in training Llama3 models.

Interconnects Memes Channel and Mini Models

The memes channel in the Interconnects Discord is now live, with initial messages starting to appear about an hour from the timestamp of the message.
Discussion indicates that mini models and a 128k context length model are available on Hugging Face, with a mention of recent availability.
A member humorously shares that enabling web search can lead to findings about an Australian politician with the same name, which inadvertently triggers their Google alerts.

Engineering and AI Discussions

OpenInterpreter Discussion:

Cloud aspirations for OpenInterpreter O1, including potential integration with brev.dev and Scaleway.
Mention of local voice control compatibility with OpenInterpreter 01.
Progress in manufacturing the 01 Light and announcement of an upcoming event on April 30th.
Call for questions regarding 01 Light manufacturing update.
Exploration of running O1 on external devices inspired by AI Pin project.

AI Engineering Explore:

Introduction to stable diffusion demos and Intel's OpenVINO Toolkit.
Discussion on the ONNX Runtime's use across various ML frameworks.
Overview of MLflow's capabilities in simplifying ML and GenAI applications.

Hydra, Perplexity, AI Engineering, and more:

Emphasis on the use of Hydra and OmegaConf for configuration management in machine learning projects.
Funding round details for Perplexity, a search solution challenging traditional search engines.
Introduction to the book 'AI Engineering' focusing on AI application development.
Discussion on Prime Intellect's infrastructure for decentralized AI development and funding.
Release of a community-driven computer vision course by HuggingFace.

TimeGPT Discussion:

Announcement of an upcoming discussion on TimeGPT in the US paper club.
Confirmation of discussion participants including the authors.

Discussions on Tinygrad (George Hotz):

Inquiry about diagram creation methods in PRs.
Focus on tinygrad topics and avoiding off-topic discussions.
Feasibility query regarding rewriting a privacy-preserving tool against facial recognition systems.
Recommendations and alternatives for PCIE risers and solutions.
Call for documentation on tinygrad operations.

Updates on DiscoResearch:

Comparison of Llama3 and Mixtral-8x7B-Instruct-v0.1 performance.
Concerns raised on evaluation metrics and prompts formatting.
Improvement in DiscoLM German 7b performance post-template correction.
Request for comparisons with command-r-plus model.

Enhanced Features and Concerns in DiscoResearch General:

Update in the Haystack LLM framework and frustrations due to Hugging Face downtime.
Inquiries on batch prompt processing through local mixtral and leveraging llm-swarm for scalable LLM inference.
Preferences for local batch processing using litellm.batch_completion over API server setup.

RAG Chatbot Expansion Ideas

A member expressed interest in augmenting a RAG (Retrieval Augmented Generation) chatbot to display web search results alongside its existing database/PDF knowledge base. They are eager to discuss additional feature ideas with the community. Nested JSON Solutions Sought in Vector DB: A request was made for solutions on defining metadata_field_info in a nested JSON for the Milvus vector database. Launching a Chat Interface Quickly: Queries were raised about the quickest method to create a startup-like interface that allows customer login and facilitates chatting with a vector database, using Langchain along with Groq or Llma. Members discussed potential toolkits to accomplish this, mentioning the possibility of using Vercel AI SDK and Chroma. Langchain Chain Types Video Series Debut: A member announced the launch of a video series dedicated to Langchain chain types, including API Chain, Constitutional Chain, RAG Chain, Checker Chain, Router Chain, and Sequential Chain, with links to the instructional videos. PGVector Store Usage in Chatbots: Information was shared on how to utilize a pgvector store as context for chatbots, and guidance on how to acquire OpenAI embeddings for this purpose was requested and subsequently provided, referencing LangChain documentation.

LLM Perf Enthusiasts AI

April Showers Bring AI Flowers: A new GPT release is teased with an anticipated launch date of April 29, as per a snippet from @DingBannu's tweet.

Google's Gemini Gearing Up: Google Gemini signals upcoming releases expected at the end of April, around the 29th and 30th, although the dates might shift, as mentioned in @testingcatalog's tweet.

FAQ

Q: What are some new AI models that were recently released and their key features?

A: Llama 3, Phi-3 mini, Internist.ai 7b, and upcoming releases from OpenAI and Google Gemini are discussed. Llama 3 features SFT, PPO, DPO alignments, and a Tiktoken-based tokenizer. Phi-3 mini matches Llama 3 8B on tasks like RAG and routing based on LlamaIndex's benchmark. Internist.ai 7b outperformed GPT-3.5 and surpassed the USMLE pass score.

Q: What is the significance of data curation and physician-in-the-loop training in the performance of medical language models?

A: The success of Internist.ai 7b in surpassing GPT-3.5 and achieving a high score on the USMLE pass exam when blindly evaluated by 10 doctors highlights the importance of data curation and physician-in-the-loop training for enhancing the performance of medical language models.

Q: What upcoming AI model releases are anticipated, and who are the sources of these expectations?

A: Anticipation is building for new releases from GPT and Google Gemini expected around April 29-30, as hinted by tweets from @DingBannu and @testingcatalog.

Q: What are some challenges reported in fine-tuning AI models like Llama-3?

A: Users have reported issues with fine-tuned Llama-3 models producing gibberish despite working well during training, indicating challenges with fine-tuning processes.

Q: What is the focus of Massively Parallel Crew discussions in the AI Discord channels?

A: Discussions in the Massively Parallel Crew group range from inviting guest speakers to clarifying presentation content, indicating diverse and engaging conversations on AI-related topics.

Q: What are the key topics discussed in the AI research section, including notable papers and advancements?

A: The AI research section covers decoding methods for LLMs, efficient diffusion models, large-scale recommender systems, generative AI copyright solutions, and privacy challenges with generative AI.

Q: What is the significance of the Discord channels related to Mojo, OpenInterpreter, AI Engineering, and TimeGPT in the AI community discussions?

A: These channels cover discussions on model advancements, configuration management in ML projects, search solutions, decentralized AI development, and upcoming events like the launch of OpenInterpreter 01 Light, providing a platform for sharing knowledge and insights.

Q: What projects or topics are highlighted in the Eleuther section of the AI Discord channels?

A: The Eleuther section discusses the launch of new open-source generative image models, assessments of Chain-of-Thought prompting techniques, and the importance of counterfactual reasoning in AI research.

Q: What are some of the computational limits discussed in the AI Discord channels, and how do they impact AI model development?

A: Topics like tokenization troubles, token merging complexities, and efficient diffusion models are discussed in relation to advancing AI model development while managing computational limits.

Q: What are some of the key discussions in the community regarding projects like Perplexity AI, Hydra, and enhancements in DiscoResearch related to AI applications and innovation?

A: Discussions range from funding rounds for Perplexity AI to the use of Hydra for configuration management and the release of community-driven computer vision courses, showcasing a vibrant ecosystem of AI applications and innovation.

Get your own AI Agent Today

Thousands of businesses worldwide are using Chaindesk Generative AI platform.
Don't get left behind - start building your own custom AI chatbot now!

Start For Free

Book a Demo