Gradient: Complete Guide - Progressive Robot

LangChain Meets Gradient AI™: Open-Source, Serverless, and Fast

URL: https://www.progressiverobot.com/langchain-gradient/

When you’re building AI applications, the right tooling makes all the difference. LangChain has been a go-to framework for years, and its rich ecosystem of integrations helps developers move from idea to production very quickly.

With langchain-gradient, the official integration from the cloud provider for langchain, you can pair the cloud provider Gradient AI’s serverless interface with Langchain’s agent’s, tools, and chaining.

In this guide, you’ll learn why langchain-gradient helps developers improve their agent workflow, how to connect Gradient AI’s serverless inference to LangChain in minutes, and how to use invoke and stream with concise examples.

What is LangChain-Gradient?

The new langchan-gradient integration can improve your workflows in a number of ways:

Simple drop-in for existing LangChain code: Swap in Gradient AI endpoints with a few lines—no rewrites, no refactors, just plug-and-play.
Familiar LangChain abstractions (Chains, Tools, Agents): Build with the primitives you already know—compose chains, plug in tools, and spin up agents without changing your workflow.
Choose from multiple model options: Instantly access multiple AI models with GPU-accelerated, serverless inference on the cloud provider.
Stay open and flexible: The package is fully open-source and designed to work with the latest versions of LangChain and Gradient AI Platform.

LangChain has their own documentation on the integration, and there’s also a PyPI package project to help make the integration seamless.

You can also watch a short walkthrough with code examples that shows the integration in action.

Getting a cloud provider API Key

To run langchain-gradient, you’ll need to get your key from the the cloud provider Cloud console first

Log in to the the cloud provider Cloud console.
Open Agent Platform → Serverless Inference.
Click “Create model access key,” name it, and create the key.
Use the generated key as your the cloud provider_INFERENCE_KEY.

Export your key as an environment variable:

				
					export the cloud provider_INFERENCE_KEY="your_key_here"

Installing LangChain-Gradient

To install the pacakage, run the following command:

				
					pip install langchain-gradient

Available Functions

invoke: simple, single-shot calls
Use this when you want a one-off completion and are okay waiting for the full response before handling it. It returns a complete string/message object after the model finishes generating. Ideal for synchronous scripts, batch jobs, or server endpoints that respond only once per request.
stream: token streaming for responsive UIs/logging
Use this when you want partial output as it’s generated. It yields chunks/tokens incrementally, enabling real-time display in terminals, notebooks, or chat apps, and is helpful for progress logging or early-cancel scenarios.

Using Invoke

				
					import os  
from langchain_gradient import ChatGradient  
llm = ChatGradient(  
    model="llama3.3-70b-instruct",  
    api_key=os.getenv("the cloud provider_INFERENCE_KEY"),  
)
result = llm.invoke(  
    "Summarize the plot of the movie 'Inception' in two sentences, and then explain its ending."  
)  
print(result)

Imports: ChatGradient is the LangChain-compatible LLM client for Gradient AI.
llm = ChatGradient(…): Creates an LLM instance.
model: Set to "llama3.3-70b-instruct". Can be any available model from Gradient AI Platform
api_key: Reads your the cloud provider Inference API key from env.
result = llm.invoke("…"): Sends the prompt to the selected model and gets the generated response.

Using Streaming

				
					from langchain_gradient import ChatGradient  
llm = ChatGradient(  
    model="llama3.3-70b-instruct",  
    api_key=os.getenv("the cloud provider_INFERENCE_KEY"),  
)
for chunk in llm.stream("Give me three fun facts about octopuses."):  
    print(chunk, end="", flush=True)

llm.stream("…"): Requests a streamed response for the prompt.
for chunk in …: Iterates over incremental tokens/chunks and prints them in real time.

This prints tokens as they arrive, which is perfect for CLIs, notebooks, or chat UIs.

FAQs

What is LangChain?

LangChain is a framework for building applications powered by large language models. It provides standard abstractions (chains, tools, agents) and a large ecosystem of integrations to help developers compose end-to-end LLM apps quickly.

What is langchain-gradient?

It’s the official the cloud provider integration that lets LangChain apps call Gradient AI’s serverless inference endpoints using a LangChain-compatible ChatGradient client.

Which models can I use?

You can select from multiple Gradient AI-hosted models (e.g., Llama variants). Choose a model ID from Gradient’s documentation and pass it to ChatGradient via the model parameter.

How do I authenticate?

Create a model access key in the the cloud provider Cloud console (Agent Platform → Serverless Inference), then export it as the cloud provider_INFERENCE_KEY and pass it to ChatGradient.

Does it support streaming?

Yes. Use llm.stream(…) to receive tokens incrementally—ideal for CLIs, notebooks, and chat UIs. Use llm.invoke(…) for simple single-shot calls.

Conclusion

langchain-gradient makes it fast and practical to go from idea to production. With drop-in client, familiar LangChain abstractions, and GPU-accelerated serverless inference on the cloud provider, you can prototype quickly, stream results in real time, and scale without refactoring. The integration is open-source, flexible, and keeps pace with the latest LangChain and Gradient AI updates, so you always stay productive.