A high-performance, unified platform for Scientific Machine Learning and LLM inference, designed for accuracy, speed, and scalability.
Nexa_Inference combines multiple scientific domains and cutting-edge language models into a single, easy-to-use API.
Predict protein secondary and tertiary structures from amino acid sequences to accelerate drug discovery and protein design.
Use Graph Neural Networks to predict key properties of novel materials, such as band gap and formation energy, for battery and semiconductor research.
Leverage a powerful language model for text generation, summarization, and analysis, complete with detailed performance metrics for your application.
Our models are optimized for efficiency and security using the latest techniques and tools from the NexaToolKit project.
All models are quantized to reduce their memory footprint and accelerate inference speed, without significant loss in accuracy. This enables deployment on a wider range of hardware.
We use the safetensors
format for all models. This provides a secure, fast, and memory-efficient way to store and load tensors, protecting against malicious code and improving performance.
Use the /api/benchmark
endpoint to test model throughput and latency on your own hardware infrastructure, ensuring predictable performance in your environment.
Get up and running in minutes. Send a POST request to our LLM endpoint and receive a response with performance data.
import requests
# Call the LLM endpoint
response = requests.post(
"https://api.nexa_inference.com/v1/llm/predict",
headers={"X-API-Key": "your_api_key"},
json={
"prompt": "Explain the significance of protein folding.",
"max_tokens": 100
}
)
# Print the results
print(response.json())
# {
# "response": "Protein folding is a critical biological process...",
# "tokens_generated": 100,
# "metrics": {
# "total_latency_ms": 512.43,
# "time_to_first_token_ms": 85.12,
# "tokens_per_second": 195.15
# }
# }