A quiet tool
for a precise craft.
Splicify is a quiet, deterministic design tool for molecular cloning. You describe the plasmid you want to build in plain language — or upload the GenBank files of what you already have — and Splicify classifies the request, resolves named parts against an in-house knowledge base, scores every cloning method that could build the target, and runs the chosen workflow end-to-end. The result is a primer set, a protocol, an annotated plasmid map, and a workflow trace that documents every decision.
The intent classifier and predesign pipeline are fully deterministic — keyword and regex rules in the front; Primer3, SBOL3, and a clean-room six-tier annotation pipeline in the back. Plain-language plasmid descriptions are matched to a corpus of >7,000 LLM-annotated reference plasmids by semantic retrieval, then edited deterministically: insertions under 40 bp ride on primer tails, insertions of 40 bp and longer become synthesis fragments. An optional LLM orchestrator slot is reserved for the cases where the deterministic edit set leaves gaps; today it ships as a no-op so every reply is reproducible.
The primer-design algorithm uses Primer3 to calculate primer characteristics and carefully weighs the optimal extensions to maximise the probability of successful PCR and assembly — annealing Tm, overlap Tm, mispriming, primer-dimer risk, secondary structures, fragment count, and length. The result is a full picture of the factors contributing to experimental success.
Open-source acknowledgements
Splicify stands on the shoulders of the scientific software community. We are grateful to the authors and maintainers of every project below for making their work openly available.
Software & libraries
- Primer3 — primer design, thermodynamic calculations, hairpin / homodimer scoring.
- SBOL3 (Synthetic Biology Open Language v3) — standardised export of modules and SBO-typed interactions; round-trip via the
pySBOL3reference implementation. - BioPython — GenBank / FASTA parsing, sequence record manipulation, feature handling.
- Sentence-Transformers (UKPLab) and the all-MiniLM-L6-v2 model — embedding plasmid token streams and natural-language descriptions for semantic retrieval.
- HNSWlib — approximate nearest-neighbour index used during corpus build (runtime queries are brute-force cosine over a NumPy array).
- SeqViz (Lattice Automation) — interactive DNA sequence visualisation in the linear viewer.
- BLAST+, MMseqs2, Infernal — feature search across the six annotation reference tiers.
- FastAPI, Next.js, React, PyTorch — the application and ML stack.
Sequence & feature data
- Addgene — 1,767 curated reference plasmids span nine functional families (basic cloning vectors, CRISPR plasmids, fluorescent-protein vectors, Gateway destination / entry vectors, I.M.A.G.E. Consortium plasmids, insect-cell vectors, luciferase vectors, Lucigen vectors, mammalian expression vectors). Used both as the regression set for the annotation pipeline and as part of the retrieval corpus for plain-language plasmid design.
- NCBI RefSeq and NCBI engineered plasmids — 41 RefSeq + 5,414 engineered records contribute to the 7,256-plasmid retrieval corpus, with sequence and metadata fetched via Entrez.
- VectorBuilder — 34 representative vectors plus 26 shorthand description ↔ token pairs that seeded the description-conditioned generative model.
- GenoLIB — 1,062 main-tier nucleotide features and 706 GenoLIB CDS translations underpin the clean-room feature reference (post-pLannotate, 2026-04-19).
- FPbase — 721 fluorescent-protein records; identifies and classifies reporter CDSs.
- UniProt / SwissProt — 66,221 curated PE-1 and whitelisted protein entries for protein-level feature search.
- Rfam — 1,737 curated families covering riboswitches, ribozymes, cis-elements, and structured non-coding RNAs.
- Gene Ontology — Sequence Ontology (SO) and Systems Biology Ontology (SBO) — role and interaction URIs that flow through to SBOL3 export.
Contact
Splicify was created by Devon Fitzpatrick, with advice on automation and business development from Rishij Mewada and help from many friends in the molecular-biology community.