Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

METHOD AND APPARATUS WITH ARABIC INFORMATION EXTRACTION AND SEMANTIC SEARCH

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Publication Date:
    January 11, 2024
  • معلومة اضافية
    • Document Number:
      20240012840
    • Appl. No:
      18/097793
    • Application Filed:
      January 17, 2023
    • نبذة مختصرة :
      An Arabic information extraction apparatus includes one or more processors configured to: receive a query comprising a long query or a short query; extract, using one or more language models, a named entity and a keyword from the query to generate extracted information; classify, using one or more classification models, the query to generate a classified query; convert the classified query and the extracted information into a dense vector representation; and determine and output a similarity match between the dense vector representation and a document vector representation of a knowledge base comprising an Islamic law document.
    • Assignees:
      Elm Company (Riyadh, SA)
    • Claim:
      1. An Arabic information extraction apparatus, comprising: one or more processors configured to: receive a query comprising a long query or a short query; extract, using one or more language models, a named entity and a keyword from the query to generate extracted information; classify, using one or more classification models, the query to generate a classified query; convert the classified query and the extracted information into a dense vector representation; and determine and output a similarity match between the dense vector representation and a document vector representation of a knowledge base comprising an Islamic law document.
    • Claim:
      2. The Arabic information extraction apparatus of claim 1, wherein the Islamic law document comprises a Quran document and a Hadith document.
    • Claim:
      3. The Arabic information extraction apparatus of claim 2, wherein the one or more processors are further configured to: collect raw data and explanations of the Quran document from online sources; parse the raw data of the Quran document into chapters using a predetermined phrase; bind determined verses of the chapters to corresponding verse explanation of the explanations; and convert the Quran document, the Hadith document, and the bound determined verses into document vector representation.
    • Claim:
      4. The Arabic information extraction apparatus of claim 3, wherein the predetermined phrase comprises “In the name of Allah, the Merciful.”
    • Claim:
      5. The Arabic information extraction apparatus of claim 1, wherein the knowledge further comprises at least one of legal cases, legislative laws, royal decrees, laws, Arabic Legal Content (ALC) raw data, or Sharia ruling documents, or any combination thereof, into the document vector representation.
    • Claim:
      6. The Arabic information extraction apparatus of claim 1, wherein at least one of the one or more classification models comprises a segmentation layer configured to segment and output sentences of the long query into context sentences.
    • Claim:
      7. The Arabic information extraction apparatus of claim 1, wherein at least one of the one or more classification models comprises a semantic layer configured to determine and output similarity matches between sentences of the long query.
    • Claim:
      8. The Arabic information extraction apparatus of claim 1, wherein the one or more processors are further configured to extract the keyword by segmenting the long query into word groups with a predetermined number of words to generate word candidates; embed the word candidates and compare their proximity to embeddings of the long query; and generate the extracted information based on a result of the comparison.
    • Claim:
      9. The Arabic information extraction apparatus of claim 1, further comprising a memory configured to store instructions; wherein the one or more processors are further configured to execute the instructions to configure the one or more processors to receive the query, extract the named entity and the keyword from the query to generate the extracted information, classify the query to generate the classified query, convert the classified query and the extracted information into the dense vector representation, determine and output the similarity match between the dense vector representation and the document vector representation of the knowledge base.
    • Claim:
      10. The Arabic information extraction apparatus of claim 1, wherein the long query has a maximum of 4500 words or 15 pages of texts, and the short query comprises a specific word.
    • Claim:
      11. The Arabic information extraction apparatus of claim 1, wherein the named entity comprises any one or any two or more of “Legal Bond-Hadith,” “Legal Bond-Quran,” “Legal Bond-jurisprudence,” “Law,” “Occupation,” “Organization,” “person,” “Address,” “Documents,” “Verdict,” “Accusation,” “Evidence,” “Citation,” “period,” “currency,” “nationality,” “amount,” and “date.”
    • Claim:
      12. The Arabic information extraction apparatus of claim 1, wherein the one or more language models comprise a named entity recognition model comprising plural layers, a first layer of the plural layers comprising pre-trained models, a second layer of the plural layers comprising a fully connected linear layer configured to receive an output of the pre-trained models, and a third layer of the plural layers being a conditional random fields (CRFs) layer configured to receive an output of the second layer.
    • Claim:
      13. The Arabic information extraction apparatus of claim 1, wherein the classified query comprises a category and a subcategory, and the subcategory is a subdivision of the category.
    • Claim:
      14. The Arabic information extraction apparatus of claim 1, wherein the classified query comprises a case category and a case class subcategory that is a subdivision of the case category.
    • Claim:
      15. The Arabic information extraction apparatus of claim 1, wherein the keyword comprises any one of “themes and facts,” or “the circuit ruled.”
    • Claim:
      16. An Arabic information extraction method, comprising: receiving a query comprising a long query or a short query; extracting, using one or more language models, a named entity and a keyword from the query to generate extracted information; classifying, using one or more classification models, the query to generate a classified query; converting the classified query and the extracted information into a dense vector representation; and determining and outputting a similarity match between the dense vector representation and a document vector representation of a knowledge base comprising an Islamic law document.
    • Claim:
      17. The method of claim 16, wherein the Islamic law document comprises a Quran document and a Hadith document.
    • Claim:
      18. The method of claim 16, wherein the knowledge base further comprises at least one of legal cases, legislative laws, royal decrees, laws, Arabic Legal Content (ALC) raw data, or Sharia ruling documents, or any combination thereof, into the document vector representation.
    • Claim:
      19. The method of claim 16, wherein at least one of the one or more classification models comprises a segmentation layer configured to segment and output sentences of the long query into context sentences.
    • Claim:
      20. A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, configure the one or more processors to perform the method of claim 16.
    • Current International Class:
      06; 06; 06
    • الرقم المعرف:
      edspap.20240012840