Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Selecting a neural network architecture for a supervised machine learning problem

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Publication Date:
    May 28, 2024
  • معلومة اضافية
    • Patent Number:
      11995,538
    • Appl. No:
      15/976514
    • Application Filed:
      May 10, 2018
    • نبذة مختصرة :
      Systems and methods for selecting a neural network for a machine learning problem are disclosed. A method includes accessing an input matrix. The method includes accessing a machine learning problem space associated with a machine learning problem and multiple untrained candidate neural networks for solving the machine learning problem. The method includes computing, for each untrained candidate neural network, at least one expressivity measure capturing an expressivity of the candidate neural network with respect to the machine learning problem. The method includes computing, for each untrained candidate neural network, at least one trainability measure capturing a trainability of the candidate neural network with respect to the machine learning problem. The method includes selecting, based on the at least one expressivity measure and the at least one trainability measure, at least one candidate neural network for solving the machine learning problem. The method includes providing an output representing the selected at least one candidate neural network.
    • Inventors:
      Microsoft Technology Licensing, LLC (Redmond, WA, US)
    • Assignees:
      Microsoft Technology Licensing, LLC (Redmond, WA, US)
    • Claim:
      1. A system comprising: processing hardware; and a memory storing instructions which cause the processing hardware to perform operations comprising: accessing a machine learning problem space associated with a machine learning problem and a plurality of untrained candidate neural networks for solving the machine learning problem, the machine learning problem space comprising data to be processed by a trained neural network; computing, for each untrained candidate neural network, at least one expressivity measure based on data of the machine learning problem space, the expressivity measure capturing an expressivity of the candidate neural network with respect to the machine learning problem, the expressivity measure being computed without training the candidate neural network; computing, for each untrained candidate neural network, at least one trainability measure based on data of the machine learning problem space, the trainability measure capturing a trainability of the candidate neural network with respect to the machine learning problem; selecting, based on the at least one expressivity measure and the at least one trainability measure, at least one candidate neural network for solving the machine learning problem, wherein selecting the at least one candidate neural network for solving the machine learning problem comprises selecting the at least one candidate neural network having the at least one expressivity measure exceeding a threshold and the at least one trainability measure within a range; and providing an output representing the selected at least one candidate neural network.
    • Claim:
      2. The system of claim 1 , wherein the at least one expressivity measure represents a measure of separation of samples from the machine learning problem space.
    • Claim:
      3. The system of claim 2 , wherein the measure of separation is a magnitude.
    • Claim:
      4. The system of claim 2 , wherein the measure of separation is an angle.
    • Claim:
      5. The system of claim 1 , wherein the at least one trainability measure represents a stochastic gradient descent of weights in the candidate neural network during a first phase of training.
    • Claim:
      6. The system of claim 1 , the operations further comprising: training the selected at least one candidate neural network to solve the machine learning problem.
    • Claim:
      7. The system of claim 6 , the operations further comprising: running the trained at least one candidate neural network on the machine learning problem space in order to solve the machine learning problem; and providing a solution to the machine learning problem generated by the trained at least one candidate neural network.
    • Claim:
      8. A non-transitory machine-readable medium storing instructions which cause one or more machines to perform operations comprising: accessing a machine learning problem space associated with a machine learning problem and a plurality of untrained candidate neural networks for solving the machine learning problem, the machine learning problem space comprising data to be processed by a trained neural network; computing, for each untrained candidate neural network, at least one expressivity measure based on data of the machine learning problem space, the expressivity measure capturing an expressivity of the candidate neural network with respect to the machine learning problem, the expressivity measure being computed without training the candidate neural network; computing, for each untrained candidate neural network, at least one trainability measure based on data of the machine learning problem space, the trainability measure capturing a trainability of the candidate neural network with respect to the machine learning problem; selecting, based on the at least one expressivity measure and the at least one trainability measure, at least one candidate neural network for solving the machine learning problem, wherein selecting the at least one candidate neural network for solving the machine learning problem comprises selecting the at least one candidate neural network having the at least one expressivity measure exceeding a threshold and the at least one trainability measure within a range; and providing an output representing the selected at least one candidate neural network.
    • Claim:
      9. The machine-readable medium of claim 8 , wherein the at least one expressivity measure represents a measure of separation of samples from the machine learning problem space.
    • Claim:
      10. The machine-readable medium of claim 9 , wherein the measure of separation is a magnitude.
    • Claim:
      11. The machine-readable medium of claim 9 , wherein the measure of separation is an angle.
    • Claim:
      12. The machine-readable medium of claim 8 , wherein the at least one trainability measure represents a stochastic gradient descent of weights in the candidate neural network during a first phase of training.
    • Claim:
      13. A method comprising: accessing a machine learning problem space associated with a machine learning problem and a plurality of untrained candidate neural networks for solving the machine learning problem, the machine learning problem space comprising data to be processed by a trained neural network; computing, for each untrained candidate neural network, at least one expressivity measure based on data of the machine learning problem space, the expressivity measure capturing an expressivity of the candidate neural network with respect to the machine learning problem, the expressivity measure being computed without training the candidate neural network; computing, for each untrained candidate neural network, at least one trainability measure based on data of the machine learning problem space, the trainability measure capturing a trainability of the candidate neural network with respect to the machine learning problem; selecting, based on the at least one expressivity measure and the at least one trainability measure, at least one candidate neural network for solving the machine learning problem, wherein selecting the at least one candidate neural network for solving the machine learning problem comprises selecting the at least one candidate neural network having the at least one expressivity measure exceeding a threshold and the at least one trainability measure within a range; and providing an output representing the selected at least one candidate neural network.
    • Claim:
      14. The method of claim 13 , wherein the at least one expressivity measure represents a measure of separation of samples from the machine learning problem space.
    • Claim:
      15. The method of claim 14 , wherein the measure of separation is a magnitude.
    • Claim:
      16. The method of claim 14 , wherein the measure of separation is an angle.
    • Claim:
      17. The method of claim 13 , wherein the at least one trainability measure represents a stochastic gradient descent of weights in the candidate neural network during a first phase of training.
    • Patent References Cited:
      20150339572 November 2015 Achin et al.
      H05314090 November 1993
      2015533437 November 2015
      2017520068 July 2017
      2017058489 April 2017

























    • Other References:
      Brock et al., “SMASH: One-Shot Model Architecture Search through HyperNetworks”, In Journal of Computing Research Repository , Aug. 2017, pp. 1-21 (Year: 2017). cited by examiner
      Hamel et al., “Transfer Learning in MIR: Sharing Learned Latent Representations for Music Audio Classification and Similarity,” 2013, International Society for Music Information Retrieval, 6 pages (Year: 2013). cited by examiner
      Elsken et al., “Simple and Efficient Architecture Search for Convolutional Neural Networks,” 2017, arXiv:1711.04528v1 [stat.ML], pp. 1-14 (Year: 2017). cited by examiner
      Baker, et al., “Accelerating neural architecture search using performance prediction”, Retrieved From: <>, May 2017, pp. 1-7. cited by applicant
      Bello, et al., “Neural Optimizer Search with Reinforcement Learning”, In Proceedings of the 34th International Conference on Machine Learning, Aug. 6, 2017, 10 Pages. cited by applicant
      Brock, et al., “SMASH: One-Shot Model Architecture Search through HyperNetworks”, In Journal of Computing Research Repository, Aug. 17, 2017, pp. 1-21. cited by applicant
      Fortunato, et al., “Bayesian recurrent neural networks”, In Journal of Computing Research Repository, Apr. 10, 2017, pp. 1-11. cited by applicant
      Hutter, et al., “Sequential Model Based optimization for general algorithm configuration”, In Proceedings of the 5th International conference on Learning and Intelligent Optimization, Jan. 17, 2011, 24 Pages. cited by applicant
      Liu, et al., “Hierarchical Representations for Efficient Architecture Search”, In Journal of Computing Research Repository, Nov. 1, 2017, pp. 1-13. cited by applicant
      Liu, et al., “Progressive neural architecture search”, In Journal of Computing Research Repository, Dec. 2017, 11 Pages. cited by applicant
      Mendoza, et al., “Towards Automatically-Tuned neural networks.”, In Proceedings of the Workshop on Automatic Machine Learning, Dec. 4, 2016, pp. 58-65. cited by applicant
      Negrinho, et al., “DeepArchitect: Automatically designing and training deep architectures”, In Computing Research Repository, Apr. 28, 2017, pp. 1-12. cited by applicant
      Schoenholzl, et al., “A correspondence between random neural networks and statistical field theory”, In Journal of Computing Research Repository, Oct. 2017, pp. 1-36. cited by applicant
      Xie, et al., “Genetic CNN”, In Proceedings of the IEEE International Conference on Computer Vision, Oct. 22, 2017, pp. 1388-1397. cited by applicant
      Yang, et al., “Mean field residual networks: On the edge of chaos”, In Proceedings of the 31st Conference on Neural Information Processing Systems, Dec. 4, 2017, pp. 1-53. cited by applicant
      Zoph, et al., “Learning transferable architectures for scalable image recognition.”, In Journal of Computing Research Repository, Jul. 2017, 14 Pages. cited by applicant
      Zoph, et al., “Neural architecture search with reinforcement learning”, In Journal of Computing Research Repository, Nov. 2016, pp. 1-15. cited by applicant
      “International Search Report and Written Opinion Issued in PCT Application No. PCT/US2019/029532”, dated Aug. 9, 2019, 12 Pages. cited by applicant
      Dong, et al., “PPAP-NET: Pareto-Optimal Platform-Aware Progressive Architecture Search”, Retrieved from https://openreview.net/references/pdf?id=B1NT3TAIM, Feb. 12, 2018, 4 Pages. cited by applicant
      Finn, et al., “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks”, In Proceedings of the 34th International Conference on Machine Learning, vol. 70, Jul. 18, 2017, 13 Pages. cited by applicant
      Kandasamy, et al., “Neural Architecture Search with Bayesian Optimisation and Optimal Transport”, In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Feb. 11, 2018, 26 Pages. cited by applicant
      Casale, et al., “Probabilistic Neural Architecture Search”, Retrieved from https://arxiv.org/pdf/1902.05116.pdf, Feb. 13, 2019, 13 Pages. cited by applicant
      “Office Action Issued in European Patent Application No. 19722485.0”, dated Jul. 28, 2023, 7 Pages. cited by applicant
      “Notice of Allowance Issued in Japanese Patent Application No. 2020-555022”, dated Aug. 7, 2023, 6 Pages. cited by applicant
      “Office Action Issued in India Patent Application No. 202017048355”, dated Aug. 8, 2022, 6 Pages. cited by applicant
      “Office Action Issued in Japanese Patent Application No. 2020-555022”, dated Apr. 4, 2023, 15 Pages. cited by applicant
    • Primary Examiner:
      Gonzales, Vincent
    • Attorney, Agent or Firm:
      Schwegman Lundberg & Woessner, P.A.
    • الرقم المعرف:
      edspgr.11995538