- Alphawise
- Posts
- Running Robots and Generative Speech AI - view the video inside.
Running Robots and Generative Speech AI - view the video inside.
Today's tutorial is from our friends at Moondream - showing an OpenAI integration.
What is today’s beat?
RELEASES
🧨 Unitree G1 Agility update
🧨 ElevenLabs moves in web and education
🧨 Google’s zero-shot audio research unveiled
NEWSROOM
🗞️ AI Investment In Europe: The 10 Biggest Deals in 2024
🗞️ NVIDIA Releases NIM Microservices
🗞️ Mira Murati’s AI Startup
BUILDER BYTES
⭐️ Our friends at MoonDream shared a tutorial
⭐️ Moondream2: Small Vision-Language Model for Edge Devices
⭐️ Building Knowledge Graph Agents with LlamaIndex Workflows
⭐️ Two hour walk-through: Building Agentic AI App with CrewAI
⭐️ Parlant: Text to Speech Library
⭐️ AgentScript: ASTI Development for Dynamic Systems in TypeScript
COMMUNITY
🤩 MIT: Machine Learning with Python: From Linear Models to Deep Learning
🤩 Eachlabs: Build AI Workflows in Seconds
🤩 Google Workspace: AI-powered collaboration for organizations of all sizes
FOLLOW US ON SOCIAL FOR MORE
Your FREE newsletter
share or subscribe
to show support
🧨 UNITREE 🧨
AGILE UPGRADE
You probably saw earlier a video of a humanoid robot walking through the streets. Well, it’s a Unitree G1, and they just released a new video showing off its agility. It’s only $16,000, image would you could do with an extra pair of hands around the house!
🧨 ELEVEN LABS 🧨
EXPANDS PRESENCE INTO WEB AND EDUCATION

Eleven Labs is rapidly expanding its footprint by integrating conversational AI into website. It is also exploring the transformative potential of voice AI in education to reshape learning experiences. Here are the latest moves with ElevenLabs, and API for generative voice AI.
Website Integration:
Full support for platforms like Wix, Squarespace and WordPress to integrate conversational AI with plugins.Conversational Voice AI in Education:
They are making strides in the education sector by applying voice-based AI to personalize learning experiences.Story Telling
Showcases Storyrabbit’s AI-driven storytelling platform, offering personalized audio stories across various genres like art, history, and sports. A very interesting way to promote your niche!
Voice AI is not quite hitting mainstream, with some popularity with OpenAI to make its mark. Eleven Labs' recent moves mark an increased accessibility across various industries, from e-commerce to education. Humans like to talk, and for some its much easier to express with words - hence the popularity.
🧨 GOOGLE 🧨
ZERO-SHOT MONO-TO-BINAURAL SPEECH SYNTHESIS: A LEAP FORWARD IN IMMERSIVE AUDIO TECHNOLOGY

Google has announced a breakthrough method for converting mono audio recordings into binaural audio (dual recording) using a zero-shot approach. This eliminates the need for large-scale binaural training datasets and opens new possibilities for immersive audio experiences in AR and VR.
Zero-Shot Approach: Google’s method synthesizes binaural sound from mono recordings without requiring binaural training data, marking a significant advance over traditional DSP methods.
Three-Stage Architecture: The process involves geometric time warping, amplitude scaling, and a denoising vocoder to generate perceptually accurate binaural audio.
Dataset Innovation: A new dataset, TUT Mono-to-Binaural, is introduced to support this method, providing valuable data for future improvements in audio synthesis. View their samples here
Superior Performance: Despite the challenges of room acoustics and background noise, Google’s method outperforms traditional models in real-world scenarios.
This innovation holds potential for revolutionizing fields like virtual and augmented reality by providing high-quality, immersive audio experiences with minimal data requirements. The integration of binaural dataset, or dual microphone capture, aims to mimic human hearing - thus for a system of capture with two ears … like a ROBOT!
🗞️ NEWSROOM 🗞️
What’s hot in tech right now?
AI Investment In Europe: The 10 Biggest Deals in 2024
Highlights Europe's rising AI sector, with nearly €3B raised in 2024, led by France, Germany, and the UK. It covers the top 10 AI funding deals and predicts further growth fueled by innovations and regulatory advancements.
NVIDIA Releases NIM Microservices to Safeguard Applications for Agentic AI
Introduces NIM microservices within NeMo Guardrails, enhancing AI security, accuracy, and scalability. These tools help businesses build safer, more controlled AI agents with tutorials to get you started using Nvida GPU accelerated AI.
Mira Murati’s AI Startup Makes First Hires, Including Former OpenAI Executive
Covers Mira Murati's new AI startup making its first major hires, including ex-OpenAI executive Jonathan Lachman. The company, focused on AGI research, has attracted top talent from AI giants like OpenAI and Google.
⭐️ TUTORIAL with MoonDream ⭐️
What will we learn today?
We posted recently about MoonDream’s new model release MoonDream2. We connected online and the friendly people over at MoonDream made a quick demo for us - to show how easy it is to use their model. It’s super easy to integrate into any project you’ve got. Their model is tailored for edge devices (e.g. mobile).
import base64
from openai import OpenAI
# Setup client
client = OpenAI(
base_url="https://api.moondream.ai/v1",
api_key="your-moondream-key"
)
# Load and encode image
with open("image.jpg", "rb") as f:
base64_image = base64.b64encode(f.read()).decode('utf-8')
# Make request
response = client.chat.completions.create(
model="moondream-2B",
messages=[{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
},
{"type": "text", "text": "Describe this image"}
]
}]
)
print(response.choices[0].message.content)
Here is their Reddit post to read more.
Thanks MoonDream AI!
Have a product and want to contribute?
Or, want to learn a topic?
👉️ tell us
⭐️ BUILDER BYTES ⭐️
What’s hot for builders right now?
MODEL: Moondream2: Small Vision-Language Model for Edge Devices
An efficient vision-language model optimized for edge devices, supporting tasks like captioning, visual querying, object detection, and pointing with real-time performance.
BLOG: Building Knowledge Graph Agents with LlamaIndex Workflows
This blog explores using LlamaIndex Workflows for creating robust knowledge graph agents, improving the text2cypher process with multi-step workflows and self-correction mechanisms to enhance query accuracy.
TUTORIAL : Two hour walk-through: Building Agentic AI App with CrewAI
This video emphasizes the significance of foundational models in AI development, exploring their transformative role in various applications and future directions.
PYTHON: Parlant: Text to Speech Library
Parlant is an open-source text-to-speech library that aims to provide a fast, efficient, and customizable solution for integrating speech synthesis into applications.
TYPESCRIPT: AgentScript: TAI Development for Dynamic Systems
AgentScript is a platform for developing autonomous agents, enabling real-time decision-making and system optimization using AI-driven techniques for dynamic environments.
🤩 COMMUNITY 🤩
What’s the latest beat?
TALK | MIT: Machine Learning with Python: From Linear Models to Deep Learning |
TOOL | Eachlabs: Build AI Workflows in Seconds |
WEBINAR | Google Workspace: AI-powered collaboration for organizations of all sizes |
THANK YOU
👇️ we are 100% free, so please let us know what you think! 👇️