Custom AI/ML Models

A Suite of Purpose-Built Models for Human Behavior

TimeStack operates 9 production AI models spanning large language models, temporal transformers, graph neural networks, reinforcement learning agents, and anomaly detection systems — all trained and optimized on NVIDIA GPU infrastructure.

9 Production Models, One Unified Intelligence

Each model specializes in a distinct aspect of behavioral intelligence. Together, they form a comprehensive system that understands, predicts, and guides human behavior.

Large Language Models
2 models
  • TimeStack Behavioral LLM (coaching & reasoning)
  • Goal Decomposition LLM (structured planning)
Temporal Models
2 models
  • Chronos Prediction Network (multi-horizon forecasting)
  • Circadian Rhythm Model (daily energy prediction)
Graph & Relational
2 models
  • DomainGraph GNN (cross-domain causality)
  • Social Influence Network (tribe dynamics)
NLP & Classification
1 model
  • Multi-task NLP Pipeline (intent, sentiment, NER)
Decision & Control
1 model
  • Intervention Optimizer (RL-based timing)
Safety & Monitoring
1 model
  • Wellbeing Sentinel (anomaly detection)
Model 01 — Large Language Model

TimeStack Behavioral LLM

A domain-specialized large language model that understands human behavioral context, generates coaching interventions, and reasons across life domains with the nuance of an expert behavioral coach.

Architecture

Built on the LLaMA-3 architecture with custom modifications for behavioral reasoning. We add a temporal position encoding layer that enables the model to reason about time-dependent behavioral patterns — understanding that a goal set 3 months ago has different context than one set yesterday. The model also includes a domain-aware attention head that specializes in cross-domain reasoning (e.g., understanding how career stress impacts health goals).

BaseLLaMA-3 8B / 70B (multi-size deployment)
Context32K tokens (extended via RoPE scaling)
Custom LayersTemporal PE + Domain-aware Attention
VocabularyExtended with 2,400 behavioral domain tokens

Training Data

The model is fine-tuned on a curated corpus spanning multiple behavioral domains:

  • Behavioral Science Literature: 50K+ papers on habit formation, goal-setting theory, behavioral economics, positive psychology
  • Coaching Transcripts: 100K+ anonymized sessions from certified coaches across ICF, NBHWC, and AACP frameworks
  • Goal Decomposition Data: 500K+ examples of hierarchical goal breakdowns with expert annotations
  • Behavioral Sequences: Synthetic sequences modeling realistic human behavior patterns across all 8 life domains
  • Intervention Outcomes: Labeled examples of effective vs. ineffective behavioral interventions with outcome data

Fine-tuning Pipeline

1
Continued Pre-training (CPT)

100B tokens of behavioral corpus. 8x H100 80GB, ~72 hours. Trains domain knowledge into base weights.

2
Supervised Fine-tuning (SFT)

200K instruction pairs for coaching, goal decomposition, and domain reasoning. LoRA rank-64 adapters. 4x H100, ~8 hours.

3
RLHF Alignment

Reward model trained on 50K preference pairs from expert coaches. PPO alignment with NeMo-Aligner. 8x H100, ~24 hours.

4
Safety Filtering

Red-team testing for harmful outputs. Constitutional AI constraints for health/mental-health topics. Guardrails for scope limitation.

Capabilities

Goal Reasoning

Decomposes abstract life visions ("I want to be healthier") into concrete, time-bound milestone hierarchies across relevant domains, accounting for cross-domain dependencies.

Contextual Coaching

Generates personalized motivational interventions grounded in the user's behavioral history, current energy level, recent achievements, and upcoming commitments.

Journal Analysis

Processes free-form journal entries to extract behavioral themes, emotional patterns, domain-specific insights, and connections the user may not consciously recognize.

Cross-Domain Reasoning

Identifies and explains causal links between life domains — e.g., how irregular sleep patterns (Health) are impacting deep work capacity (Career) and relationship quality (Relationships).

Model 02 — Structured Generation LLM

Goal Decomposition Engine

A smaller, specialized LLM trained for structured output — converting high-level life goals into hierarchical plans with temporal dependencies, resource requirements, and measurable milestones.

Architecture & Purpose

While the Behavioral LLM handles free-form coaching, the Goal Decomposition Engine is optimized for structured JSON output. Built on a LLaMA-3 3B base (smaller for speed), it uses constrained decoding to ensure valid goal hierarchies with proper temporal ordering and dependency graphs.

BaseLLaMA-3 3B (speed-optimized)
OutputConstrained JSON (goal schema)
Latency<500ms full decomposition
TrainingSFT on 100K decomposition examples

Output Structure

example output schema
{
  "vision": "Become a published author",
  "horizon": "12_months",
  "milestones": [
    {
      "title": "Complete first draft",
      "deadline": "2026-06",
      "domain": "learning",
      "sprints": [
        {
          "title": "Establish daily writing habit",
          "weeks": 4,
          "weekly_goals": [
            "Write 500 words daily (Mon-Fri)",
            "Read 1 chapter of craft book weekly"
          ],
          "dependencies": [],
          "difficulty": 0.6
        }
      ]
    }
  ],
  "cross_domain_impacts": {
    "career": 0.3,
    "joy": 0.7,
    "relationships": -0.1
  }
}
Model 03 — Temporal Prediction

Chronos Prediction Network

A multi-scale temporal fusion transformer that models human behavioral patterns across 5 time horizons — predicting everything from today's energy curve to 12-month goal completion probability.

Architecture

Based on the Temporal Fusion Transformer (TFT) architecture, extended with multi-horizon attention heads and domain-conditioned gating. The model processes variable-length behavioral time series with irregular intervals (humans don't check in at exact intervals) using our custom temporal windowed convolution CUDA kernel.

ArchitectureExtended Temporal Fusion Transformer
Input Features128 behavioral signals per timestep
HorizonsDaily / Weekly / Sprint / Quarterly / Annual
OutputProbabilistic forecasts (quantile regression)
Parameters45M
Inference<30ms on TensorRT INT8

What It Predicts

Daily
  • Energy levels by hour
  • Optimal focus windows
  • Task completion probability per scheduled task
  • Distraction risk score by time block
Weekly
  • Domain balance trajectory
  • Streak continuation probability
  • Burnout risk index
  • Social engagement prediction
Sprint (90d)
  • Goal completion trajectories
  • Habit formation curves
  • Behavioral pattern stability
  • Intervention effectiveness decay
Quarterly / Annual
  • Life domain balance forecast
  • Long-term trend projections
  • Milestone achievability scores
  • Compound growth trajectory

Input Signals (128 features per timestep)

Behavioral

Task completions, check-in responses, goal progress deltas, streak status, focus session durations, distraction counts

Temporal

Time of day, day of week, days since last check-in, time since goal creation, seasonal patterns

Affective

Self-reported mood, energy levels, journal sentiment scores, emotional volatility index

Contextual

Active domain counts, concurrent goal load, social interaction frequency, reward redemption patterns

Model 04 — Graph Neural Network

DomainGraph Neural Network

A personalized graph attention network that learns causal interdependencies between 8 life domains — the core innovation that enables TimeStack's whole-life intelligence.

The Core Insight

Human life domains are not independent. Sleep quality affects work performance. Financial stress impacts relationships. Exercise boosts mood and learning capacity. No existing productivity tool models these interactions — they treat each domain in isolation.

DomainGraph learns a personalized causal graph for each user, where nodes represent domains and edges encode the strength and direction of causal influence. The model discovers that for User A, career stress strongly impacts health (negative edge), while for User B, the dominant pathway is health → career (exercise improves focus).

Architecture

TypeGraph Attention Network v2 (GATv2)
Nodes8 domains + 40 sub-category nodes
EdgesLearned attention weights (directed, signed)
Layers4 GAT layers + 2 readout MLP layers
AttentionCustom cross-domain sparse (CUDA kernel)
Parameters12M (shared) + 0.5M (per-user adapter)

What It Enables

Impact Prediction

"If you skip workouts this week, your Career domain score is predicted to drop 15% over the next 2 weeks based on your personal pattern."

Balance Optimization

"Your Relationships domain is underserved. Based on your graph, investing 2 hours here would positively impact Joy (+12%) and Growth (+8%)."

Root Cause Analysis

"Your Career satisfaction dropped this month. Tracing upstream: the root cause appears to be a Learning domain decline (no new skills) that reduced confidence."

Intervention Targeting

"Rather than addressing Career directly, the highest-leverage intervention is improving your sleep quality (Health), which cascades to 4 other domains for you."

Model 05 — Circadian Intelligence

Circadian Rhythm Model

A specialized temporal model that learns individual daily energy patterns to predict optimal windows for different types of tasks — deep work, creative thinking, physical activity, and recovery.

Approach

Traditional productivity advice prescribes fixed schedules ("do deep work at 9am"). But individual circadian patterns vary dramatically. Our model learns each user's unique energy curve through check-in data, focus session performance, task completion timing, and self-reported energy levels.

The model uses a mixture of periodic functions (learned sinusoidal components) combined with contextual modifiers (sleep quality last night, day of week, recent stress levels) to produce a personalized hourly energy forecast. This drives the scheduling optimizer.

ArchitecturePeriodic Neural Network + MLP modifiers
Output24-hour energy curve at 30-min granularity
PersonalizationConverges after ~14 days of check-ins
Accuracy82% energy level prediction (within 1 level)

Application: Intelligent Scheduling

The Circadian model feeds directly into the scheduling optimizer:

  • Deep work tasks scheduled during predicted peak cognitive windows
  • Creative/brainstorming tasks placed in slightly-below-peak "diffuse thinking" windows
  • Administrative tasks placed in low-energy recovery periods
  • Exercise/physical goals aligned with physical energy peaks (which differ from cognitive peaks)
  • Social/relationship goals scheduled when emotional energy is highest

Specialized Models for NLP, Decisions & Safety

Model 06 — NLP

Multi-task NLP Pipeline

A multi-task transformer (DeBERTa-v3 base) fine-tuned for simultaneous domain classification, sentiment analysis, named entity recognition (goals, people, activities), and intent detection. Processes every text input to the platform in real-time.

ArchitectureDeBERTa-v3 base (multi-head)
TasksDomain (8-class) + Sentiment + NER + Intent
Accuracy94.2% domain, 91.8% intent, 89.5% NER
Latency3ms on TensorRT (INT8)
Model 07 — Social Graph

Social Influence Network

Graph neural network modeling accountability dynamics in Tribes (social groups). Learns influence patterns: who motivates whom, optimal group compositions for sustained engagement, and when peer interventions (kudos, challenges) are most effective.

ArchitectureGraphSAGE with temporal edges
FeaturesUser embeddings + interaction history
OutputInfluence scores, group recommendations
TrainingRAPIDS cuGraph + PyTorch Geometric
Model 08 — Reinforcement Learning

Intervention Optimizer

PPO-based RL agent that learns the optimal timing, type, and intensity of behavioral interventions for each user. Balances immediate engagement against long-term behavioral change, explicitly modeling notification fatigue and diminishing returns.

AlgorithmProximal Policy Optimization (PPO)
State256-dim behavioral context vector
Actions12 intervention types x 24 time slots
Reward7-day rolling behavior adherence score
Model 09 — Anomaly Detection

Wellbeing Sentinel

Variational autoencoder that learns each user's behavioral baseline and flags statistically significant deviations — early indicators of burnout, disengagement, or wellbeing decline. Triggers graduated intervention protocols from gentle check-ins to escalated support suggestions.

ArchitectureConditional VAE with temporal encoder
Latent Space64-dim, per-user calibrated
DetectionBurnout, disengagement, mood decline
SensitivityAdaptive thresholds (minimize false positives)

Training Philosophy: Personalization at Scale

Our model architecture follows a two-tier approach: shared foundation models that capture universal behavioral patterns, enhanced by lightweight per-user adapters that personalize predictions.

01

Foundation → Adapter Architecture

Large shared models (trained on collective behavioral patterns) provide robust base predictions. Lightweight LoRA adapters (0.5-2M parameters) fine-tune per user, enabling personalization without per-user training costs. Adapters converge within 7-14 days of user data.

02

Federated Learning for Privacy

User behavioral data never leaves their shard. We use federated averaging to improve shared model weights: local gradients computed on-device (or on user-specific server partitions) are aggregated without exposing raw data. Differential privacy noise (epsilon=8) provides formal privacy guarantees.

03

Continuous Online Learning

Models don't wait for batch retraining. Our custom CUDA kernel for personalized embedding updates enables real-time adaptation. When a user's behavior shifts (new job, life event), the model detects the distribution shift and accelerates adapter learning rate to re-converge within 48 hours.

04

Multi-task Joint Optimization

Models that share users benefit from joint training. The NLP pipeline, embedding model, and Chronos predictor share lower encoder layers, enabling knowledge transfer: better sentiment understanding improves energy prediction, and vice versa. Trained end-to-end on multi-task loss.

9 Models. One Unified Intelligence.

Every model in the TimeStack suite is trained on NVIDIA GPUs, optimized through TensorRT, and served via Triton. Together, they form the most comprehensive AI system ever built for understanding and optimizing human behavior.