Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

TCR-REPERTOIRE FRAMEWORK FOR MULTIPLE DISEASE DIAGNOSIS

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Publication Date:
    August 29, 2024
  • معلومة اضافية
    • Document Number:
      20240290418
    • Appl. No:
      18/571515
    • Application Filed:
      June 17, 2022
    • نبذة مختصرة :
      A novel method of geometric isometry based antigen-specific TCR alignment (GIANA) is described herein. GIANA is an antigen-specific TCR clustering method that is able to efficiently handle tens of millions of sequences. GIANA achieved higher sensitivity and precision than all existing methods, and is able to retrieve TCRs specific to known antigens with high accuracy. The ultra-large-scale TCR clustering and fast query of novel samples also enabled a novel reference-based repertoire classification framework. GIANA can also analyze single cell RNA-seq data with TCR regions solved, and it is possible to query TCRs from unknown data against the large database of TCR repertoire samples in the public domain, and provide new insights over shared antigen-specificity. GIANA is applicable to cluster or query large B cell receptor sequencing data as well.
    • Assignees:
      THE BOARD OF REGENTS OF THE UNIVERSITY OF TEXAS SYSTEM (Austin, TX, US)
    • Claim:
      1. A method of improving computational efficiency for T-cell receptor (TCR) comparisons, the method comprising: identifying, by a computing device, complementary determining region 3 (CDR3) sequences from a reference TCR sequence (TCR-seq) dataset, the reference TCR-seq dataset consisting of TCRs specific to only one epitope; encoding, by the computing device, each of the CDR3 sequences from the reference TCR-seq dataset into numeric vectors, the numeric vectors corresponding to a sequence of amino acids in each of the CDR3 sequences; converting, by the computing device, the numeric vectors to coordinates in a high-dimensional Euclidean space; generating, by the computing device, a predictive model using a neural network by: learning, by the neural network, to generate a tree data structure of the numeric vectors based on relative distances of the coordinates, and grouping, by the neural network, the coordinates into pre-clusters based on the relative distances; filtering, by the computing device, the CDR3 sequences in the pre-clusters; and identifying, by the computing device, antigen-specific CDR3 clusters from the filtered pre-clusters.
    • Claim:
      2. The method of claim 1, further comprising: performing, by the computing device, the identifying, encoding, converting, generating, and filtering steps on a query TCR-seq dataset, the query TCR-seq dataset having no known antigen-specific TCR information; comparing, by the computing device, the filtered pre-clusters from the query TCR-seq dataset to the antigen-specific CDR3 clusters; and determining, by the computing device, that the filtered pre-clusters from the query TCR-seq dataset match the antigen-specific CDR3 clusters to diagnose and/or determine disease status.
    • Claim:
      3. The method of claim 1, further comprising: grouping, by the computing device, CDR3 sequences having identical coordinates together.
    • Claim:
      4. The method of claim 1, wherein the filtering comprises: comparing, by the computing device, TCR variable (TRBV) alleles of each pair of CDR3 sequences in the pre-clusters to determine an alignment score; and splitting, by the computing device, the pre-clusters into one or more new pre-clusters if the score is above a predetermined level.
    • Claim:
      5. The method of claim 4, wherein the filtering further comprises: performing, by the computing device, a Smith-Waterman alignment on each of the pre-clusters to determine an alignment score; and removing, by the computing device, a pre-cluster if the score is below a predetermined level.
    • Claim:
      6. The method of claim 1, wherein the encoding comprises: performing, by the computing device, a sequence of unitary transformations on each of the CDR3 sequences.
    • Claim:
      7. A computing device comprising: a processor operatively coupled to a memory storing non-transitory computer-readable instructions that, when executed by the processor, cause the processor to: identify complementary determining region 3 (CDR3) sequences from a reference TCR sequence (TCR-seq) dataset, the reference TCR-seq dataset consisting of TCRs specific to only one epitope; encode each of the CDR3 sequences from the reference TCR-seq dataset into numeric vectors, the numeric vectors corresponding to a sequence of amino acids in each of the CDR3 sequences; convert the numeric vectors to coordinates in a high-dimensional Euclidean space; generate a predictive model using a neural network by: learning, by the neural network, to generate a tree data structure of the numeric vectors based on relative distances of the coordinates, and grouping, by the neural network, the coordinates into pre-clusters based on the relative distances; filter the CDR3 sequences in the pre-clusters; and identify antigen-specific CDR3 clusters from the filtered pre-clusters.
    • Claim:
      8. The computing device of claim 7, wherein the computer-readable instructions, when executed by the processor, further cause the processor to: perform the identifying, encoding, converting, generating, and filtering steps on a query TCR-seq dataset, the query TCR-seq dataset having no known antigen-specific TCR information; compare the filtered pre-clusters from the query TCR-seq dataset to the antigen-specific CDR3 clusters; and determine that the filtered pre-clusters from the query TCR-seq dataset match the antigen-specific CDR3 clusters to diagnose and/or determine disease status.
    • Claim:
      9. The computing device of claim 7, wherein the computer-readable instructions, when executed by the processor, further cause the processor to: group CDR3 sequences having identical coordinates together.
    • Claim:
      10. The computing device of claim 7, wherein the filtering comprises: comparing, by the computing device, TCR variable (TRBV) alleles of each pair of CDR3 sequences in the pre-clusters to determine an alignment score; and splitting, by the computing device, the pre-clusters into one or more new pre-clusters if the score is above a predetermined level.
    • Claim:
      11. The computing device of claim 10, wherein the filtering further comprises: performing, by the computing device, a Smith-Waterman alignment on each of the pre-clusters to determine an alignment score; and removing, by the computing device, a pre-cluster if the score is below a predetermined level.
    • Claim:
      12. The computing device of claim 7, wherein the encoding comprises: performing, by the computing device, a sequence of unitary transformations on each of the CDR3 sequences.
    • Claim:
      13. A non-transitory computer-readable storage medium tangibly encoded with computer-executable instructions, that when executed by a processor associated with a computing device, cause the processor to: identify complementary determining region 3 (CDR3) sequences from a reference TCR sequence (TCR-seq) dataset, the reference TCR-seq dataset consisting of TCRs specific to only one epitope; encode each of the CDR3 sequences from the reference TCR-seq dataset into numeric vectors, the numeric vectors corresponding to a sequence of amino acids in each of the CDR3 sequences; convert the numeric vectors to coordinates in a high-dimensional Euclidean space; generating a predictive model using a neural network by: learning, by the neural network, to generate a tree data structure of the numeric vectors based on relative distances of the coordinates, and group, by the neural network, the coordinates into pre-clusters based on the relative distances; filter the CDR3 sequences in the pre-clusters; and identify antigen-specific CDR3 clusters from the filtered pre-clusters.
    • Claim:
      14. The non-transitory computer-readable storage medium of claim 13, wherein the computer-executable instructions, when executed by the processor, cause the processor to: perform the identifying, encoding, converting, generating, and filtering steps on a query TCR-seq dataset, the query TCR-seq dataset having no known antigen-specific TCR information; compare the filtered pre-clusters from the query TCR-seq dataset to the antigen-specific CDR3 clusters; and determine that the filtered pre-clusters from the query TCR-seq dataset match the antigen-specific CDR3 clusters to diagnose and/or determine disease status.
    • Claim:
      15. The non-transitory computer-readable storage medium of claim 13, wherein the computer-executable instructions, when executed by the processor, cause the processor to: group CDR3 sequences having identical coordinates together.
    • Claim:
      16. The non-transitory computer-readable storage medium of claim 13, wherein the filtering comprises: comparing, by the computing device, TCR variable (TRBV) alleles of each pair of CDR3 sequences in the pre-clusters to determine an alignment score; and splitting, by the computing device, the pre-clusters into one or more new pre-clusters if the score is above a predetermined level.
    • Claim:
      17. The non-transitory computer-readable storage medium of claim 16, wherein the filtering further comprises: performing, by the computing device, a Smith-Waterman alignment on each of the pre-clusters to determine an alignment score; and removing, by the computing device, a pre-cluster if the score is below a predetermined level.
    • Claim:
      18. The non-transitory computer-readable storage medium of claim 13, wherein the encoding comprises: performing, by the computing device, a sequence of unitary transformations on each of the CDR3 sequences.
    • Claim:
      19. A method of organizing and querying a T-cell receptor (TCR) database using common antigen specificity, the method comprising: performing a nearest neighbor search using one or more TCR dissimilarity metrics to find pairs of TCRs with common antigen specificity.
    • Claim:
      20. The method of claim 19, wherein the one or more TCR dissimilarity metrics comprise one or more of a Smith-Waterman distance and an embedding in a high-dimensional Euclidean space; or any other distance or dissimilarity metric.
    • Current International Class:
      16; 16; 16
    • الرقم المعرف:
      edspap.20240290418