Nexa_Inference

A high-performance, unified platform for Scientific Machine Learning and LLM inference, designed for accuracy, speed, and scalability.

One Platform, Infinite Discoveries

Nexa_Inference combines multiple scientific domains and cutting-edge language models into a single, easy-to-use API.

Biology & Drug Discovery

Predict protein secondary and tertiary structures from amino acid sequences to accelerate drug discovery and protein design.

Materials Science

Use Graph Neural Networks to predict key properties of novel materials, such as band gap and formation energy, for battery and semiconductor research.

LLM Inference

Leverage a powerful language model for text generation, summarization, and analysis, complete with detailed performance metrics for your application.

Advanced Model Technology

Our models are optimized for efficiency and security using the latest techniques and tools from the NexaToolKit project.

Model Quantization

All models are quantized to reduce their memory footprint and accelerate inference speed, without significant loss in accuracy. This enables deployment on a wider range of hardware.

SafeTensors

We use the safetensors format for all models. This provides a secure, fast, and memory-efficient way to store and load tensors, protecting against malicious code and improving performance.

On-Demand Benchmarking

Use the /api/benchmark endpoint to test model throughput and latency on your own hardware infrastructure, ensuring predictable performance in your environment.

API Quickstart

Get up and running in minutes. Send a POST request to our LLM endpoint and receive a response with performance data.


import requests

# Call the LLM endpoint
response = requests.post(
    "https://api.nexa_inference.com/v1/llm/predict",
    headers={"X-API-Key": "your_api_key"},
    json={
        "prompt": "Explain the significance of protein folding.",
        "max_tokens": 100
    }
)

# Print the results
print(response.json())

# {
#   "response": "Protein folding is a critical biological process...",
#   "tokens_generated": 100,
#   "metrics": {
#     "total_latency_ms": 512.43,
#     "time_to_first_token_ms": 85.12,
#     "tokens_per_second": 195.15
#   }
# }
            
View Full Documentation