usage: trill [-h] [--nodes NODES] [--logger LOGGER] [--profiler]
             [--RNG_seed RNG_SEED] [--outdir OUTDIR] [--n_workers N_WORKERS]
             name GPUs

Positional Arguments


Name of run


Input total number of GPUs per node

Default: 1


Possible choices: embed, finetune, inv_fold_gen, lang_gen, diff_gen, classify, fold, visualize, simulate, dock, utils

Named Arguments


Input total number of nodes. Default is 1

Default: 1


Enable Tensorboard logger. Default is None

Default: False


Utilize PyTorchProfiler

Default: False


Input RNG seed. Default is 123

Default: 123


Input full path to directory where you want the output from TRILL

Default: “.”


Change number of CPU cores/’workers’ TRILL uses

Default: 1



Embed proteins of interest

trill embed [-h] [--batch_size BATCH_SIZE] [--finetuned FINETUNED] [--per_AA]

Positional Arguments


Possible choices: esm2_t6_8M, esm2_t12_35M, esm2_t30_150M, esm2_t33_650M, esm2_t36_3B, esm2_t48_15B, ProtT5-XL, ProstT5, Ankh, Ankh-Large

Choose protein language model to embed query proteins


Input protein fasta file

Named Arguments


Change batch-size number for embedding proteins. Default is 1, but with more RAM, you can do more

Default: 1


Input path to your own finetuned ESM model

Default: False


Add this flag to return the per amino acid representations.

Default: False


Add this flag to return the average, whole sequence representation.

Default: False


Finetune protein language models

trill finetune [-h] [--epochs EPOCHS] [--save_on_epoch] [--lr LR]
               [--batch_size BATCH_SIZE] [--mask_fraction MASK_FRACTION]
               [--pre_masked_fasta] [--strategy STRATEGY]
               [--ctrl_tag CTRL_TAG] [--finetuned FINETUNED]

Positional Arguments


Possible choices: esm2_t6_8M, esm2_t12_35M, esm2_t30_150M, esm2_t33_650M, esm2_t36_3B, esm2_t48_15B, ProtGPT2, ZymCTRL

Choose the protein language model to finetune. Note that ESM2 is trained with the MLM objective, while ProtGPT2/ZymCTRL are trained with the CLM objective.


Input fasta file

Named Arguments


Number of epochs for fine-tuning. Default is 10

Default: 10


Saves a checkpoint on every successful epoch completed. WARNING, this could lead to rapid storage consumption

Default: False


Learning rate for optimizer. Default is 0.0001

Default: 0.0001


Change batch-size number for fine-tuning. Default is 1

Default: 1


ESM: Change fraction of animo acids masked for MLM training. Default is 0.15

Default: 0.15


ESM: Use this flag to specify that your input fasta will be pre-masked and does not need masking performed by TRILL. The sequences will still be randomly shuffled.

Default: False


Change training strategy. Default is None. List of strategies can be found at


ZymCTRL: Choose an Enzymatic Commision (EC) control tag for finetuning ZymCTRL. Note that the tag must match all of the enzymes in the query fasta file. You can find all ECs here


Input path to your previously finetuned model to continue finetuning

Default: False


Generate proteins using inverse folding

trill inv_fold_gen [-h] [--temp TEMP]
                   [--num_return_sequences NUM_RETURN_SEQUENCES]
                   [--max_length MAX_LENGTH] [--top_p TOP_P]
                   [--repetition_penalty REPETITION_PENALTY] [--dont_sample]
                   [--mpnn_model MPNN_MODEL] [--save_score SAVE_SCORE]
                   [--save_probs SAVE_PROBS] [--score_only SCORE_ONLY]
                   [--path_to_fasta PATH_TO_FASTA]
                   [--conditional_probs_only CONDITIONAL_PROBS_ONLY]
                   [--conditional_probs_only_backbone CONDITIONAL_PROBS_ONLY_BACKBONE]
                   [--unconditional_probs_only UNCONDITIONAL_PROBS_ONLY]
                   [--backbone_noise BACKBONE_NOISE] [--batch_size BATCH_SIZE]
                   [--pdb_path_chains PDB_PATH_CHAINS]
                   [--chain_id_jsonl CHAIN_ID_JSONL]
                   [--fixed_positions_jsonl FIXED_POSITIONS_JSONL]
                   [--omit_AAs OMIT_AAS] [--bias_AA_jsonl BIAS_AA_JSONL]
                   [--bias_by_res_jsonl BIAS_BY_RES_JSONL]
                   [--omit_AA_jsonl OMIT_AA_JSONL] [--pssm_jsonl PSSM_JSONL]
                   [--pssm_multi PSSM_MULTI] [--pssm_threshold PSSM_THRESHOLD]
                   [--pssm_log_odds_flag PSSM_LOG_ODDS_FLAG]
                   [--pssm_bias_flag PSSM_BIAS_FLAG]
                   [--tied_positions_jsonl TIED_POSITIONS_JSONL]
                   {ESM-IF1,ProteinMPNN,ProstT5} query

Positional Arguments


Possible choices: ESM-IF1, ProteinMPNN, ProstT5

Select which model to generate proteins using inverse folding.


Input pdb file for inverse folding

Named Arguments


Choose sampling temperature.

Default: “1”


Choose number of proteins to generate.

Default: 1


Max length of proteins generated, default is 500 AAs

Default: 500


ProstT5: If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. Default is 1

Default: 1


ProstT5: The parameter for repetition penalty. 1.0 means no penalty, the default is 1.2

Default: 1.2


ProstT5: By default, the model will sample to generate the protein. With this flag, you can enable greedy decoding, where the most probable tokens will be returned.

Default: True


ProteinMPNN: v_48_002, v_48_010, v_48_020, v_48_030; v_48_010=version with 48 edges 0.10A noise

Default: “v_48_020”


ProteinMPNN: 0 for False, 1 for True; save score=-log_prob to npy files

Default: 0


ProteinMPNN: 0 for False, 1 for True; save MPNN predicted probabilites per position

Default: 0


ProteinMPNN: 0 for False, 1 for True; score input backbone-sequence pairs

Default: 0


ProteinMPNN: score provided input sequence in a fasta format; e.g. GGGGGG/PPPPS/WWW for chains A, B, C sorted alphabetically and separated by /

Default: “”


ProteinMPNN: 0 for False, 1 for True; output conditional probabilities p(s_i given the rest of the sequence and backbone)

Default: 0


ProteinMPNN: 0 for False, 1 for True; if true output conditional probabilities p(s_i given backbone)

Default: 0


ProteinMPNN: 0 for False, 1 for True; output unconditional probabilities p(s_i given backbone) in one forward pass

Default: 0


ProteinMPNN: Standard deviation of Gaussian noise to add to backbone atoms

Default: 0.0


ProteinMPNN: Batch size; can set higher for titan, quadro GPUs, reduce this if running out of GPU memory

Default: 1


ProteinMPNN: Define which chains need to be designed for a single PDB

Default: “”


ProteinMPNN: Path to a dictionary specifying which chains need to be designed and which ones are fixed, if not specied all chains will be designed.

Default: “”


ProteinMPNN: Path to a dictionary with fixed positions

Default: “”


ProteinMPNN: Specify which amino acids should be omitted in the generated sequence, e.g. ‘AC’ would omit alanine and cystine.

Default: X


ProteinMPNN: Path to a dictionary which specifies AA composion bias if neededi, e.g. {A: -1.1, F: 0.7} would make A less likely and F more likely.

Default: “”


ProteinMPNN: Path to dictionary with per position bias.

Default: “”


ProteinMPNN: Path to a dictionary which specifies which amino acids need to be omited from design at specific chain indices

Default: “”


ProteinMPNN: Path to a dictionary with pssm

Default: “”


ProteinMPNN: A value between [0.0, 1.0], 0.0 means do not use pssm, 1.0 ignore MPNN predictions

Default: 0.0


ProteinMPNN: A value between -inf + inf to restric per position AAs

Default: 0.0


ProteinMPNN: 0 for False, 1 for True

Default: 0


ProteinMPNN: 0 for False, 1 for True

Default: 0


ProteinMPNN: Path to a dictionary with tied positions

Default: “”


Generate proteins using large language models

trill lang_gen [-h] [--finetuned FINETUNED] [--esm2_arch ESM2_ARCH]
               [--temp TEMP] [--ctrl_tag CTRL_TAG] [--batch_size BATCH_SIZE]
               [--seed_seq SEED_SEQ] [--max_length MAX_LENGTH]
               [--do_sample DO_SAMPLE] [--top_k TOP_K]
               [--repetition_penalty REPETITION_PENALTY]
               [--num_return_sequences NUM_RETURN_SEQUENCES] [--random_fill]
               [--num_positions NUM_POSITIONS]

Positional Arguments


Possible choices: ESM2, ProtGPT2, ZymCTRL

Choose desired language model

Named Arguments


Input path to your own finetuned model

Default: False


ESM2_Gibbs: Choose which ESM2 architecture your finetuned model is

Default: “esm2_t12_35M_UR50D”


Choose sampling temperature.

Default: “1”


ZymCTRL: Choose an Enzymatic Commision (EC) control tag for conditional protein generation based on the tag. You can find all ECs here


Change batch-size number to modulate how many proteins are generated at a time. Default is 1

Default: 1


Sequence to seed generation, the default is M.

Default: “M”


Max length of proteins generated, default is 100

Default: 100


ProtGPT2/ZymCTRL: Whether or not to use sampling for generation; use greedy decoding otherwise

Default: True


The number of highest probability vocabulary tokens to keep for top-k-filtering

Default: 950


ProtGPT2/ZymCTRL: The parameter for repetition penalty, the default is 1.2. 1.0 means no penalty

Default: 1.2


Number of sequences to generate. Default is 1

Default: 1


ESM2_Gibbs: Randomly select positions to fill each iteration for Gibbs sampling with ESM2. If not called then fill the positions in order

Default: True


ESM2_Gibbs: Generate new AAs for this many positions each iteration for Gibbs sampling with ESM2. If 0, then generate for all target positions each round.

Default: 0


Generate proteins using RFDiffusion

trill diff_gen [-h] [--contigs CONTIGS]
               [--RFDiffusion_Override RFDIFFUSION_OVERRIDE]
               [--num_return_sequences NUM_RETURN_SEQUENCES]
               [--Inpaint INPAINT] [--query QUERY] [--partial_T PARTIAL_T]
               [--partial_diff_fix PARTIAL_DIFF_FIX] [--hotspots HOTSPOTS]

Named Arguments


Generate proteins between these sizes in AAs for RFDiffusion. For example, –contig 100-200, will result in proteins in this range


Change RFDiffusion model. For example, –RFDiffusion_Override ActiveSite will use for holding small motifs in place.

Default: False


Number of sequences for RFDiffusion to generate. Default is 5

Default: 5


Residues to inpaint.


Input pdb file for motif scaffolding, partial diffusion etc.


Adjust partial diffusion sampling value.


Pass the residues that you want to keep fixed for your input pdb during partial diffusion. Note that the residues should be 0-indexed.


Define resiudes that binder must interact with. For example, –hotspots A30,A33,A34 , where A is the chain and the numbers are the residue indices.


Classify proteins using either pretrained classifiers or train/test your own.

trill classify [-h] [--key KEY] [--save_emb]
               [--emb_model {esm2_t6_8M,esm2_t12_35M,esm2_t30_150M,esm2_t33_650M,esm2_t36_3B,esm2_t48_15B,ProtT5-XL,ProstT5,Ankh,Ankh-Large}]
               [--train_split TRAIN_SPLIT] [--preTrained PRETRAINED]
               [--preComputed_Embs PRECOMPUTED_EMBS] [--batch_size BATCH_SIZE]
               [--xg_gamma XG_GAMMA] [--xg_lr XG_LR]
               [--xg_max_depth XG_MAX_DEPTH] [--xg_reg_alpha XG_REG_ALPHA]
               [--xg_reg_lambda XG_REG_LAMBDA]
               [--if_contamination IF_CONTAMINATION]
               [--n_estimators N_ESTIMATORS] [--sweep] [--sweep_cv SWEEP_CV]
               [--f1_avg_method {macro,weighted,micro,None}]
               {TemStaPro,EpHod,XGBoost,iForest} query

Positional Arguments


Possible choices: TemStaPro, EpHod, XGBoost, iForest

Predict thermostability/optimal enzymatic pH using TemStaPro/EpHod or choose custom to train/use your own XGBoost or Isolation Forest classifier. Note for training XGBoost, you need to submit roughly equal amounts of each class as part of your query.


Fasta file of sequences to score

Named Arguments


Input a CSV, with your class mappings for your embeddings where the first column is the label and the second column is the class.


Save csv of ProtT5 embeddings

Default: False


Possible choices: esm2_t6_8M, esm2_t12_35M, esm2_t30_150M, esm2_t33_650M, esm2_t36_3B, esm2_t48_15B, ProtT5-XL, ProstT5, Ankh, Ankh-Large

Select desired protein language model for embedding your query proteins to then train your custom classifier. Default is esm2_t12_35M

Default: “esm2_t12_35M”


Choose your train-test percentage split for training and evaluating your custom classifier. For example, –train .6 would split your input sequences into two groups, one with 60% of the sequences to train and the other with 40% for evaluating


Enter the path to your pre-trained XGBoost binary classifier that you’ve trained with TRILL. This will be a .json file.


Enter the path to your pre-computed embeddings. Make sure they match the –emb_model you select.

Default: False


EpHod: Sets batch_size for embedding with ESM1v.

Default: 1


XGBoost: sets gamma for XGBoost, which is a hyperparameter that sets ‘Minimum loss reduction required to make a further partition on a leaf node of the tree.’

Default: 0.4


XGBoost: Sets the learning rate for XGBoost

Default: 0.2


XGBoost: Sets the maximum tree depth

Default: 8


XGBoost: L1 regularization term on weights

Default: 0.8


XGBoost: L2 regularization term on weights

Default: 0.1


iForest: The amount of outliers in the data. Default is automatically determined, but you can set it between (0 , 0.5])

Default: “auto”


XGBoost/iForest: Number of boosting rounds

Default: 115


XGBoost: Use this flag to perform cross-validated bayesian optimization over the hyperparameter space.

Default: False


XGBoost: Change the number of folds used for cross-validation.

Default: 3


Possible choices: macro, weighted, micro, None

XGBoost: Change the scoring method used for calculated F1. Default is with no averaging.


Predict 3D protein structures using ESMFold or obtain 3Di structure for use with Foldseek to perform remote homology detection

trill fold [-h] [--strategy STRATEGY] [--batch_size BATCH_SIZE]
           {ESMFold,ProstT5} query

Positional Arguments


Possible choices: ESMFold, ProstT5

Choose your desired model.


Input fasta file

Named Arguments


ESMFold: Choose a specific strategy if you are running out of CUDA memory. You can also pass either 64, or 32 for model.trunk.set_chunk_size(x)


ESMFold: Change batch-size number for folding proteins. Default is 1

Default: 1


Reduce dimensionality of embeddings to 2D

trill visualize [-h] [--method {PCA,UMAP,tSNE}] [--key KEY] embeddings

Positional Arguments


Embeddings to be visualized

Named Arguments


Possible choices: PCA, UMAP, tSNE

Method for reducing dimensions of embeddings. Default is PCA

Default: “PCA”


Input a CSV, with your group mappings for your embeddings where the first column is the label and the second column is the group to be colored.

Default: False


Use MD to relax protein structures

trill simulate [-h] [--ligand LIGAND]
               [--constraints {None,HBonds,AllBonds,HAngles}] [--rigidWater]
               [--forcefield FORCEFIELD] [--solvent SOLVENT] [--solvate]
               [--step_size STEP_SIZE] [--num_steps NUM_STEPS]
               [--reporting_interval REPORTING_INTERVAL]
               [--output_traj_dcd OUTPUT_TRAJ_DCD]
               [--apply-harmonic-force APPLY_HARMONIC_FORCE]
               [--force-constant FORCE_CONSTANT] [--z0 Z0]
               [--molecule-atom-indices MOLECULE_ATOM_INDICES]
               [--equilibration_steps EQUILIBRATION_STEPS]
               [--periodic_box PERIODIC_BOX] [--just_relax]
               [--reporter_interval REPORTER_INTERVAL]

Positional Arguments


Receptor of interest to be simulated. Must be either pdb file or a .txt file with the absolute path for each pdb, separated by a new-line.

Named Arguments


Ligand of interest to be simulated with input receptor


Possible choices: None, HBonds, AllBonds, HAngles

Specifies which bonds and angles should be implemented with constraints. Allowed values are None, HBonds, AllBonds, or HAngles.

Default: “None”


If true, water molecules will be fully rigid regardless of the value passed for the constraints argument.


Force field to use. Default is amber14-all.xml

Default: “amber14-all.xml”


Solvent model to use, the default is amber14/tip3pfb.xml

Default: “amber14/tip3pfb.xml”


Add to solvate your simulation

Default: False


Step size in femtoseconds. Default is 2

Default: 2


Number of simulation steps

Default: 5000


Reporting interval for simulation

Default: 1000


Output trajectory DCD file

Default: “trajectory.dcd”


Whether to apply a harmonic force to pull the molecule.

Default: False


Force constant for the harmonic force in kJ/mol/nm^2.


The z-coordinate to pull towards in nm.


Comma-separated list of atom indices to which the harmonic force will be applied.

Default: “0,1,2”


Steps you want to take for NVT and NPT equilibration. Each step is 0.002 picoseconds

Default: 300


Give, in nm, one of the dimensions to build the periodic boundary.

Default: 10


Just relaxes the input structure(s) and outputs the fixed and relaxed structure(s). The forcefield that is used is amber14.

Default: False


Set interval to save PDB and energy snapshot. Note that the higher the number, the bigger the output files will be and the slower the simulation. Default is 1000

Default: 1000


Perform molecular docking with proteins and ligands. Note that you should relax your protein receptor with Simulate or another method before docking.

trill dock [-h] [--save_visualisation]
           [--samples_per_complex SAMPLES_PER_COMPLEX] [--no_final_step_noise]
           [--inference_steps INFERENCE_STEPS] [--actual_steps ACTUAL_STEPS]
           [--min_radius MIN_RADIUS] [--max_radius MAX_RADIUS]
           [--min_alpha_spheres MIN_ALPHA_SPHERES]
           [--exhaustiveness EXHAUSTIVENESS] [--blind] [--anm]
           [--swarms SWARMS] [--sim_steps SIM_STEPS] [--restraints RESTRAINTS]
           {DiffDock,Vina,Smina,LightDock,GeoDock} protein
           [ligand [ligand ...]]

Positional Arguments


Possible choices: DiffDock, Vina, Smina, LightDock, GeoDock

Note that while LightDock can dock protein ligands, DiffDock, Smina, and Vina can only do small-molecules.


Protein of interest to be docked with ligand


Ligand to dock protein with. Note that with Autodock Vina, you can dock multiple ligands at one time. Simply provide them one after another before any other optional TRILL arguments are added. Also, if a .txt file is provided with each line providing the absolute path to different ligands, TRILL will dock each ligand one at a time.

Named Arguments


DiffDock: Save a pdb file with all of the steps of the reverse diffusion.

Default: False


DiffDock: Number of samples to generate.

Default: 10


DiffDock: Use no noise in the final step of the reverse diffusion

Default: False


DiffDock: Number of denoising steps

Default: 20


DiffDock: Number of denoising steps that are actually performed


Smina/Vina + Fpocket: Minimum radius of alpha spheres in a pocket. Default is 3Å.

Default: 3.0


Smina/Vina + Fpocket: Maximum radius of alpha spheres in a pocket. Default is 6Å.

Default: 6.0


Smina/Vina + Fpocket: Minimum number of alpha spheres a pocket must contain to be considered. Default is 35.

Default: 35


Smina/Vina: Change computational effort.

Default: 8


Smina/Vina: Perform blind docking and skip binding pocket prediction with fpocket

Default: False


LightDock: If selected, backbone flexibility is modeled using Anisotropic Network Model (via ProDy)

Default: False


LightDock: The number of swarms of the simulations, default is 25

Default: 25


LightDock: The number of steps of the simulation. Default is 100

Default: 100


LightDock: If restraints_file is provided, residue restraints will be considered during the setup and the simulation


Misc utilities

trill utils [-h] [--dir DIR] [--fasta_paths_txt FASTA_PATHS_TXT]
            [--uniprotDB {UniProtKB,A.thaliana,C.elegans,E.coli,H.sapiens,M.musculus,R.norvegicus,SARS-CoV-2}]
            [--rep {per_AA,avg}]

Positional Arguments


Possible choices: prepare_class_key, fetch_embeddings

prepare_class_key: Pepare a csv for use with the classify command. Takes a directory or text file with list of paths for fasta files. Each file will be a unique class, so if your directory contains 5 fasta files, there will be 5 classes in the output key csv.

Named Arguments


Directory to be used for creating a class key csv for classification.


Text file with absolute paths of fasta files to be used for creating the class key. Each unique path will be treated as a unique class, and all the sequences in that file will be in the same class.


Possible choices: UniProtKB, A.thaliana, C.elegans, E.coli, H.sapiens, M.musculus, R.norvegicus, SARS-CoV-2

UniProt embedding dataset to download.


Possible choices: per_AA, avg

The representation to download.