raminderpalsingh.com

ScienceClaw

Autonomous Life Sciences Research Platform

Powered by OpenClaw · A research initiative by HitchhikersAI.org, sponsored by 20/15 Visioneers
Feedback & questions welcome: raminderpal@hitchhikersai.org
Findings Insights Market Intelligence

Autonomous intelligence for oncology clinical trials

ScienceClaw continuously monitors oncology clinical trial registries for design anomalies, synthesises trial signals with daily news intelligence, and delivers findings as email and a live website.

What it does

Insights — Weekly

Synthesised Trial Intelligence

A daily KB-building cycle scans oncology news across five topic areas, cross-references confirmed trial findings, and synthesises intelligence into evidence-backed insight records. Published weekly to Insights after Kimi K2.5 compliance verification.

Trial Intelligence — Daily

Clinical Trial Anomaly Detection

Builds empirical baselines from 4,000+ oncology trials at the indication–modality–line-of-therapy level. Scores each new trial across six deviation dimensions, investigates through six public APIs, and reports only findings passing a three-part confidence filter. Raw findings at Findings.

Email In / Email Out The only interface — no dashboards, no logins OpenClaw Multi-agent orchestration framework — schedules tasks, isolates sessions, manages handoffs LLM-agnostic · Configurable to any vertical · Open source WORKFLOWS Trial Intelligence Anomaly detection · Investigation · Watchlist · Findings Insights Daily KB building · Synthesis · Weekly intelligence SAFEGUARDS — HOW THE SYSTEM ENSURES IT IS DOING THE RIGHT THINGS Meta-Guardrail A single document defines the system's entire remit: what topics are in scope · what behaviour is permitted what evidence standards must be met · what tone to use Every output is checked against this document before delivery Cross-Model Verification The AI that writes is never the AI that checks. A separate, independent model verifies every outbound communication against the meta-guardrail. If the check fails, the draft is revised and re-verified Deterministic Computation Scripts handle all data collection, counting, and scoring. The AI is confined to reasoning and interpretation — it cannot contradict computed facts, write to files, or modify the knowledge base directly. Evidence Standards Every factual claim must trace to a specific source. Sources are ranked: peer-reviewed highest, then multi-source news, then trade publications, then vendor materials. Missing data is flagged, not filled. Knowledge Base Grows daily · Tiered memory (hot / warm / cold) · All findings grounded in sourced evidence KB feeds future queries and daily research
How ScienceClaw works — OpenClaw orchestrates two workflows, with four safeguards ensuring every output is scoped, verified, evidence-grounded, and computationally honest

Design principles

Evidence only — the AI never fabricates data
Gaps flagged, not filled with plausible guesses
Email is the only interface — no dashboards to learn
Human always in the loop for decisions
Cross-model verification — the writer and checker are different AIs
Scope-governed by an explicit meta-guardrail
Deterministic computation, agentic reasoning — scripts count, LLMs interpret

The opportunity

Life sciences teams spend significant time on two recurring problems: gathering and synthesising market intelligence across fragmented sources, and tracking the design landscape of clinical trials in their therapeutic area. Both are manual, repetitive, and error-prone.

For market intelligence, competitive landscapes shift daily. Clinical trial readouts, regulatory decisions, and partnership announcements can alter strategic priorities overnight. Most organisations respond with periodic, manual research cycles — quarterly reports, ad-hoc analyst requests, internal briefings that are stale before they're circulated. ScienceClaw replaces that with a continuous, autonomous research process. The knowledge base grows every day. Cross-domain connections that analysts often miss are surfaced automatically. On-demand queries combine accumulated intelligence with targeted web searches, and every outbound communication is verified by an independent AI model before delivery.

For trial intelligence, no systematic method exists for continuously detecting when a newly registered trial's design deviates from established norms. Commercial intelligence services address this through curated databases and periodic reports, but these are retrospective, expensive, and optimised for landscape coverage rather than early signal detection. ScienceClaw builds empirical baselines at the indication–modality–line-of-therapy level from 4,000+ real trials, scores each new trial using a weighted composite formula, investigates anomalies through eight public biomedical APIs, and reports only findings that survive a three-part confidence filter. The first operational deployment surfaced confirmed high-impact findings including novel PARP inhibitor designs in bladder cancer and the first 4-1BB agonist to reach Phase 3 in NSCLC.

Both workflows share the same interface — email — and the same design philosophy: evidence only, gaps flagged, human always in the loop.

This is not a chatbot. It is a set of structured research and surveillance workflows with anti-hallucination guardrails, cross-model verification, baseline-deviation anomaly detection, and a self-expanding topic registry. The AI is a component of the system, not the system itself.

Method

Insights Method

KB-Building & Synthesis Workflow

The Insights tab is produced by a daily KB-building cycle: news scanning, two-pass topic analysis, cross-synthesis, and insight record extraction — the same architecture described in the paper below, now configured for the oncology clinical trial landscape.

Download the methods paper v1.1 (PDF)

How it works

You send an email

Tag the subject line with [MARKET RESEARCH] and your topics. Describe what you need — competitive landscape, pipeline comparison, regulatory outlook. The system extracts intent and entities from your query using natural-language parsing with entity recognition.

Evidence is gathered from the knowledge base and web

The platform matches your topics against its accumulated intelligence: daily market briefs, cross-domain synthesis notes, and news scan tags. When the knowledge base has gaps, targeted web searches constructed from extracted entities supplement the KB evidence. A source quality hierarchy weights claims: peer-reviewed sources highest, vendor press releases lowest.

AI produces a sourced analysis

The analysis model reads the gathered evidence alongside your question. It is instructed to use only the provided evidence, separate facts from interpretation, and attribute every factual claim to a specific source. Missing data is flagged, not filled with plausible guesses.

A second AI verifies the response

Before sending, an independent verification model checks the draft against a meta-guardrail: scope compliance, evidence standards, vendor neutrality, and tone. This cross-model verification prevents the blind-spot problem of self-verification — the model that writes the analysis is never the model that checks it.

You receive a verified memo

A reply arrives in the same email thread, structured as: Key Takeaways, Supporting Evidence, Risks and Unknowns, and Recommended Next Steps. Every factual claim traces to a specific source. If the evidence is weak or contradictory, that's stated clearly rather than papered over.

What makes this different

Knowledge-base grounded with web fallback

Every claim traces to a specific source in the knowledge base or a cited web search result. The daily autonomous cycle builds a cumulative intelligence layer; on-demand queries supplement this with targeted searches when coverage gaps are detected. A four-tier source quality hierarchy ensures vendor press releases are weighted below peer-reviewed evidence.

Cross-model verification

The model that generates the analysis is never the model that verifies it. A separate language model checks every outbound communication against a meta-guardrail document — verifying scope compliance, evidence standards, vendor neutrality, and tone. If the check fails, the draft is revised and re-verified before sending.

Autonomous term discovery

The system doesn't just track what you've told it to watch. A term discovery mechanism continuously identifies new vendors, products, and technologies from daily news scans. Discovered terms are added to the vocabulary and expand future search coverage — with scope filtering governed by the same meta-guardrail that governs outputs.

Cross-topic synthesis built in

The daily research cycle includes a dedicated synthesis stage that reads across all topics to find connections, contradictions, and emerging trends. On-demand queries can surface relevant signals from adjacent areas that a siloed analyst might miss.

A — DAILY KNOWLEDGE BUILDING (proactive, runs every morning) News Scan Brave Search · 24h window Topic Analysis Two-pass per topic Synthesis Cross-topic connections Retrospective Quality review Knowledge Base Insight records · Synthesis · News tags · Term registry Hot (14d) · Warm (15–60d) · Cold (archive) Term Discovery New drugs · Mechanisms · Sponsors Scope-filtered by meta-guardrail B — TRIAL INTELLIGENCE DAILY PIPELINE (11:45–14:35 UK, deterministic + agentic) Registry Scan ClinicalTrials.gov v2 + EU CTIS Scoring 6-dimension composite + first-in detection Watchlist Status changes Endpoint amendments Investigation GLM-5 · Top 5 candidates ChEMBL · OT · PubMed · FDA 3-Part Filter Real · Novel · Triangulated Silence if nothing passes Email + HTML report Failed confidence filter → filed, not reported C — GOVERNANCE (applies to every output) Meta-Guardrail Scope · Behaviour · Evidence standards · Tone Cross-Model Verification Kimi 2.5 checks GLM-5 outputs independently Source Quality Hierarchy 1. Peer-reviewed · 2. Multi-source news 3. Single-source trade · 4. Vendor materials D — RELIABILITY (runs 24/7, no manual intervention) Auto-restart Health monitoring Graceful fallback Daily ops summary
End-to-end market research workflow — daily knowledge building feeds on-demand queries, with cross-model verification and meta-guardrail governance on every output

Example output

The following illustrates a real on-demand research response produced by the same KB-building pipeline, configured for a different vertical. In the oncology configuration, the pipeline produces daily synthesis over checkpoint combinations, ADC pipeline, bispecifics, biomarker enrichment, and regulatory endpoint shifts.

Real output Example — KB-building pipeline output (lab automation configuration)
Market research output — key takeaways, evidence tables, screening modalities Market research output — integration failure modes, cross-market synthesis, risks

Self-expanding research coverage

The Insights cycle doesn't only cover the five seed topics. Two mechanisms expand coverage. First, the daily retrospective detects signals that don't match existing topics and proposes new research areas for operator review. Second, term discovery identifies new drugs, mechanisms, sponsors, and indications — each proposal requiring two independent source signals before review.

You reply to topic proposals with a simple approve or reject command. Approved topics immediately join the daily research cycle and become available for on-demand queries. Discovered terms are filtered by the meta-guardrail to ensure they fall within the system's defined scope before being added. The platform gradually expands its coverage toward where the market is moving, not just where you originally pointed it.

Guardrails: a meta-guardrail document defines the system's scope and behavioural rules. Every outbound communication is verified against this document by an independent language model before sending. The AI can propose at most three new topics per day, each requiring at least two independent source signals. Your manually curated topic registry is never modified automatically. All proposals require explicit human approval before activation.

Method — Oncology Trials

Findings Method

Trial Intelligence Workflow

Automated anomaly detection for oncology clinical trial design. Focused exclusively on oncology — checkpoint inhibitors, ADCs, bispecifics, cell therapies, targeted agents, and more. The system builds empirical baselines at the indication–modality–line-of-therapy level from 4,000+ real oncology trials, scores each new trial using a weighted composite formula, investigates high-scoring deviations through public biomedical APIs, and reports only findings that survive a three-part confidence filter.

Download the methods paper v0.4 (PDF)

How it works

Baselines are built at indication–modality–line level

For each indication–modality–line-of-therapy triplet (e.g. first-line NSCLC checkpoint inhibitors, second-line-plus melanoma), the system queries ClinicalTrials.gov for all Phase 2 and Phase 3 interventional trials. It captures the modal primary endpoint, typical comparator, standard biomarkers, expected sample size range, design architecture, and sponsor composition. Line-of-therapy splitting reduced the false flagging rate from 61% to 44%. Currently 17 indication–modality pairs with 75 baseline files (21 combined, 54 line-specific) built from 4,000+ real trials.

New trials are scored with a weighted composite formula

Each new or updated trial is compared against its baseline across six dimensions. A weighted composite anomaly score ranks candidates by severity — endpoint choice (weight 3.0), comparator strategy (2.0), design architecture (2.0), biomarker enrichment (1.5), and sample size (1.0). Scores reflect rarity: a trial using an endpoint present in 2% of the baseline scores higher than one at 40%. Trials below a threshold of 3.0 are excluded. This reduced daily candidates from ~118 to ~93.

First-in patterns are detected

A separate detection stage checks the top 20 daily candidates for three novel entry patterns: first_sponsor (no prior trials in this baseline), first_phase3 (drug entering Phase 3 for the first time in this indication), and first_combination (two agents not previously tested together). Any first-in detection adds a +3.0 boost, ensuring novel competitive entries always reach investigation.

Anomalies are investigated through adaptive chains

The top five candidates each day are investigated by an autonomous agent using ChEMBL, Open Targets, bioRxiv, openFDA, PubMed, and the EU Clinical Trials Information System. Investigation chains adapt to the deviation type: endpoint deviations trigger regulatory guidance searches; novel compounds trigger mechanism lookups. Bounded by a budget of 10–20 API calls per candidate.

Confidence filter and watchlist

Every anomaly must pass three checks: is the deviation real? Is it novel? Can it be triangulated? Findings that pass are emailed; silence means no anomalies. Reported trials are added to a persistent watchlist that monitors for status changes (terminated, withdrawn, suspended), endpoint amendments, and enrolment changes exceeding 20%. Currently tracking 93 active trials.

Six deviation dimensions

Endpoint deviation

Primary endpoint does not match the baseline's modal endpoint for its indication–modality pair.

Comparator deviation

Trial uses a comparator type (active, placebo, or none) that diverges from the baseline distribution.

Enrichment deviation

Eligibility criteria include a novel biomarker or omit one that the baseline shows as standard.

Sample size deviation

Enrolment target falls outside the baseline's interquartile range for its phase.

Design deviation

Trial uses a design architecture (basket, umbrella, adaptive platform) not represented in the baseline.

New entrant deviation

A sponsor or therapeutic modality not previously seen in this indication's baseline.

What makes this different

Anomaly detection, not landscape analysis

The system does not attempt to track all oncology trials. It builds baselines of what is normal at the indication–modality–line-of-therapy level, then scores deviations using a weighted composite formula. Most trials match their baseline and are filed without analysis. Only high-scoring deviations receive investigation and reporting.

Investigation, not just flagging

Detecting a deviation is not sufficient. The system investigates each anomaly through adaptive chains — searching for regulatory guidance changes, competitor readouts, safety signals, and published validation studies. Findings are reported as "explained" or "unexplained" — the latter often the most interesting.

All public, free data sources

ClinicalTrials.gov, EU CTIS, ChEMBL, Open Targets, bioRxiv, openFDA, PubMed, and conference abstracts (ASCO/ESMO/ASH) — all accessed via free public APIs. No commercial data subscriptions required. Reproducible by any research group.

Autonomous direction discovery

Four discovery mechanisms layer on top of daily detection: unmatched trial analysis surfaces blind spots in baseline coverage, temporal trend detection identifies endpoint drift and biomarker adoption, cross-trial convergence finds independent sponsors making the same unusual design choice, and sponsor portfolio tracking detects strategic moves like new indication entries or withdrawal clusters.

Illustrative baseline data

Different indication–modality pairs exhibit meaningfully different design conventions. NSCLC Phase 3 trials concentrate heavily on progression-free survival (36.8% of primary endpoint designations), creating a clear baseline against which deviations are detectable. Melanoma trials distribute more evenly across PFS, ORR, OS, and DFS — requiring a higher deviation threshold.

NSCLC Phase 3 (100 trials, 136 endpoints)

  • PFS dominates at 36.8% — a new trial choosing DoR or a PRO as sole primary endpoint would constitute a deviation
  • OS is second at 22.8%, consistent with regulatory precedent
  • DFS at 9.6% reflects perioperative trial designs
  • "Other" endpoints at 11.0% — manageable heterogeneity

Melanoma Phase 3 (50 trials, 72 endpoints)

  • PFS, ORR, OS, and DFS each represent 12–23% — no single dominant endpoint
  • "Other" category at 36.1% — substantial heterogeneity from broad clinical spectrum
  • Adjuvant, metastatic, and response-focused trial designs coexist
  • Higher deviation threshold required to avoid false positives

Operational anomaly findings

The following scenarios are drawn from the first week of operational deployment (13–20 March 2026) to illustrate the types of anomalies the system detects in practice.

First-in sponsor: olaparib in bladder cancer

AstraZeneca registered a Phase 3 trial of olaparib in bladder cancer without BRCA or HRR biomarker enrichment. Flagged as a first_sponsor deviation (no prior AstraZeneca trials in bladder × checkpoint-inhibitor baseline) and an enrichment deviation (omitting biomarker selection in a PARP inhibitor trial is unusual). Investigation via ChEMBL and PubMed found no published rationale for biomarker-unselected PARP inhibition in bladder cancer. Classified as "unexplained" — the most interesting category.

First Phase 3 for a novel mechanism: 4-1BB agonist in NSCLC

BioNTech's acasunlimab (a 4-1BB agonist) reached Phase 3 in NSCLC. The system detected a first_phase3 pattern: no 4-1BB agonist had previously entered Phase 3 in NSCLC. The +3.0 first-in boost elevated this trial above several higher-scoring baseline deviations, reflecting the clinical significance of a new mechanism class reaching late-stage development.

Convergent signals: mRNA in bladder cancer

In the same week, Roche and Merck both opened trials with mRNA-based components in bladder cancer. Neither company had prior bladder cancer mRNA trials. The system detected both as individual first_sponsor anomalies but did not automatically link them — highlighting the need for the cross-trial convergence detection mechanism now being implemented.

Limitations to be aware of: baselines depend on ClinicalTrials.gov data quality, which is sponsor-submitted and intentionally vague on design rationale. Indication–modality pairs with fewer than 15 trials cannot be monitored. The scoring weights (endpoint 3.0, comparator 2.0, etc.) were set based on clinical judgment, not empirical optimisation — the feedback mechanism is intended to enable data-driven calibration but requires months of sustained engagement. The system detects anomalous design choices — it does not assess whether those choices are good or bad. An experienced oncology analyst scanning the registry weekly would likely identify the same high-impact findings; the system's advantage is consistency and coverage, not clinical judgment. No prospective validation has been performed.

In Development

Knowledge Graph

ScienceClaw's third layer of intelligence: a persistent, growing graph that connects trials, drugs, mechanisms, sponsors, indications, and clinical findings across all three workflows — enabling convergence signals, strategic pivot detection, and cross-thread triangulation that flat-file analysis cannot produce.

The Knowledge Graph is currently under active development. The backbone architecture and Tier 1 ingest are complete. Tier 2 extraction and Tier 3 gap-fill agents are in the implementation phase. This page describes the design intent and the insights it is built to produce.

Why a knowledge graph

ScienceClaw's Trial Intelligence workflow detects individual anomalies. Its Oncology Insights workflow synthesises news into themes. Both produce valuable output — but they operate in isolation. A trial flagged as anomalous by Thread 2 and a bioRxiv preprint catalogued by Thread 3 might share a mechanism and an indication, forming a potential explanation. In the current architecture, no process connects them.

The Knowledge Graph solves this. Every entity — trial, drug, sponsor, mechanism, indication, regulatory event, KB theme — becomes a node. Every relationship becomes a traversable edge with a confidence score and a source reference. Cross-thread triangulation, multi-sponsor convergence detection, and regulatory impact propagation all become single graph queries rather than multi-pass scripting tasks.

The graph is built in Python using NetworkX, stored as a single JSON file, and updated daily after Trial Intelligence completes. There is no database server. The entire graph fits in memory. All queries run in the same process as the rest of ScienceClaw.

Three tiers of evidence

TIER 1 Structured APIs Deterministic · Confidence 0.9–1.0 ClinicalTrials.gov (NCT IDs) Trials · Sponsors · Endpoints ChEMBL (ChEMBL IDs) Drugs · Mechanisms of action Thread 2 findings.json Confirmed anomaly findings EU CTIS European trial registrations Auto-insert. No review needed. TIER 2 KB Extraction GLM-5 assisted · Confidence 0.5–0.75 Thread 3 synthesis JSONL Daily oncology news synthesis Insight records Theme → Trial / Drug / Sponsor edges Conference abstracts ASCO · ESMO · ASH · AACR Requires source_id on every edge. Unresolved entities → proposals queue. TIER 3 Gap-Fill Agents OpenClaw reasoning · Confidence 0.35–0.85 Mechanism lookup ChEMBL search for unknown MoA Convergence inference Multi-sponsor pattern detection Regulatory propagation Guidance → affected trial links Cross-thread triangulation Anomaly ↔ KB theme linkage All output to proposals queue. Operator review before graph insert. Confidence 0.9–1.0 Confidence 0.5–0.75 Confidence 0.35–0.85
Three tiers of evidence — Tier 1 (structured APIs, auto-inserted), Tier 2 (KB extraction, source-verified), Tier 3 (agent reasoning, operator-reviewed). Every edge in the graph carries a confidence score and a source reference.

What the graph enables

Cross-trial convergence detection

The graph holds every trial's sponsor, mechanism, and indication as connected nodes. A single query can find indication–mechanism pairs where two or more independent sponsors have opened trials within a defined time window — a convergence signal that no analyst scanning CT.gov linearly would reliably catch. The mRNA-in-bladder-cancer case from week one of deployment (two independent sponsors, same week, neither individually anomalous) is exactly the pattern this query is designed to surface automatically.

Sponsor portfolio pivot detection

Each sponsor's full trial history is encoded in the graph. When a new trial registers in an indication the sponsor has never entered before, the graph computes this as a first-indication event automatically — queryable retroactively across any time window and comparable across all sponsors simultaneously, without manual cross-referencing of state files.

Cross-thread triangulation

Thread 2 flags an unexplained anomaly: a PARP inhibitor trial with no biomarker enrichment. Thread 3's KB holds a bioRxiv preprint noting HRR signals in that tumour type. In the current flat-file architecture these two items never meet. In the graph, both connect to the same Indication and Mechanism nodes — a query surfaces the potential explanation automatically, and GLM-5 reasons over the subgraph rather than 40KB of markdown.

Regulatory signal propagation

When ODAC issues updated guidance on a surrogate endpoint, Thread 3's KB records the regulatory event as a node. The graph already holds all trials using that endpoint. A gap-fill agent proposes AFFECTED_BY edges connecting those trials to the regulatory event — automatically surfacing which watchlist trials face regulatory risk, without any manual lookup.

Design principles

Every edge has a source

No edge is inserted without a source_id — an NCT ID, ChEMBL ID, URL, or KB file path. An edge without a traceable source is rejected at insertion. This is enforced by the single write-access script, not by convention.

Confidence is computed, not declared

Every edge carries a confidence score between 0.0 and 1.0, set by deterministic rules based on the number and tier of supporting sources. The LLM never self-assesses confidence for graph insertion purposes — that would introduce the same overconfidence problem the cross-model verification architecture was designed to solve.

Single write-access chokepoint

One script — kg_writer.py — is the only process with write access to the graph. All ingest scripts, extraction agents, and gap-fill agents submit operations to a batch queue. This creates a single auditable point where schema rules are enforced before any data reaches the graph.

Tier 3 always reviewed

Agent-inferred edges — convergence signals, regulatory propagation, cross-thread links — go to a daily proposals queue before graph insertion. A digest email summarises pending proposals. Auto-approval requires a source reference and confidence ≥ 0.8. Inferred edges below this threshold always require explicit approval. This mirrors the term discovery discipline already in place across Threads 1 and 3.

Queries are confidence-filtered

Every query that feeds an emailed report filters on a minimum confidence threshold. A convergence finding derived solely from Tier 3 inferred edges is flagged as lower-certainty. A finding supported by multiple Tier 1 edges is flagged as high-certainty. The graph never flattens this distinction.

No database server

The graph runs as a Python object in-process, serialised to a single JSON file after each write session using an atomic rename. No port, no service to supervise, no failure mode from a database process being killed during a cron window. At the scale of this system — a few thousand nodes at maturity — this is the right choice.

Honest limitations

The alias problem is real and requires sustained maintenance. Pembrolizumab, Keytruda, MK-3475, and pembro are the same drug — but CT.gov, ChEMBL, news coverage, and conference abstracts use all four. Without a continuously maintained canonical name table, these appear as four separate Drug nodes and convergence queries produce false negatives. The canonical table grows automatically as aliases are resolved, but the first weeks will have meaningful alias noise. Tier 2 edges derived from GLM-5 extraction will have a non-trivial error rate in the first months — estimated 15–25% on entity resolution — until the extraction prompt is tuned against real KB output. Tier 3 inferred relationships are genuinely uncertain; the confidence scores reflect this, but they do not guarantee correctness. The graph improves with time; early outputs should be treated as exploratory signals, not confirmed intelligence.

Live knowledge graph

Updated daily after the 15:50 pipeline run. Nodes are coloured by type — trials, drugs, sponsors, mechanisms, indications, and findings. Edge thickness reflects confidence score. Click any node to inspect its properties.

Filter by type:
Loading graph data…

Get in touch

For more info, contact raminderpal@hitchhikersai.org