How Does AlphaGo Zero Implement Reinforcement Learning?What is self-supervised learning in machine learning?What is experience replay in laymen's terms?AlphaZero chess algorithm, Monte Carlo searchHow to implement a contextual reinforcement learning model?How to implement a constrained action space in reinforcement learning?Would AlphaGo Zero become perfect with enough training time?How does reinforcement learning handle measured disturbances?Rollout algorithm like Monte Carlo search suggest model based reinforcement learning?What is the difference between DQN and AlphaGo Zero?How do the achievements met in the gaming field (ex. AlphaGo Zero) impact other fields of application?Feature Selection using Monte Carlo Tree SearchHow can alpha zero learn if the tree search stops and restarts before finishing a game?

If there's something that implicates the president why is there then a national security issue? (John Dowd)

A map of non-pathological topology?

What would be the way to say "just saying" in German? (Not the literal translation)

Why can I traceroute to this IP address, but not ping?

Why can my keyboard only digest 6 keypresses at a time?

How do free-speech protections in the United States apply in public to corporate misrepresentations?

Proving that a Russian cryptographic standard is too structured

What aircraft was used as Air Force One for the flight between Southampton and Shannon?

What is the color of artificial intelligence?

Are inverted question and exclamation mark supposed to be symmetrical to the "normal" counter-parts?

How to trick the reader into thinking they're following a redshirt instead of the protagonist?

How creative should the DM let an artificer be in terms of what they can build?

Solve Riddle With Algebra

I have a problematic assistant manager, but I can't fire him

Is it possible to have 2 different but equal size real number sets that have the same mean and standard deviation?

How to safely destroy (a large quantity of) valid checks?

How to publish items after pipeline is finished?

Are polynomials with the same roots identical?

Why are MBA programs closing?

Is it possible to fly backward if you have really strong headwind?

What does 思ってやっている mean?

Bb13b9 confusion

Why does this query, missing a FROM clause, not error out?

Does putting salt first make it easier for attacker to bruteforce the hash?



How Does AlphaGo Zero Implement Reinforcement Learning?


What is self-supervised learning in machine learning?What is experience replay in laymen's terms?AlphaZero chess algorithm, Monte Carlo searchHow to implement a contextual reinforcement learning model?How to implement a constrained action space in reinforcement learning?Would AlphaGo Zero become perfect with enough training time?How does reinforcement learning handle measured disturbances?Rollout algorithm like Monte Carlo search suggest model based reinforcement learning?What is the difference between DQN and AlphaGo Zero?How do the achievements met in the gaming field (ex. AlphaGo Zero) impact other fields of application?Feature Selection using Monte Carlo Tree SearchHow can alpha zero learn if the tree search stops and restarts before finishing a game?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1












$begingroup$


AlphaGo Zero (https://deepmind.com/blog/alphago-zero-learning-scratch/) has several key components that contribute to it's success:



  1. A Monte Carlo Tree Search Algorithm that allows it to better search and learn from the state space of Go

  2. A Deep Neural Network architecture that learns the value and policies of given states, to better inform the MCTS.

My question is, how is this Reinforcement Learning? Or rather, what aspects of this algorithm specifically make it a Reinforcement Learning problem? Couldn't this just be considered a Supervised Learning problem?










share|improve this question









$endgroup$


















    1












    $begingroup$


    AlphaGo Zero (https://deepmind.com/blog/alphago-zero-learning-scratch/) has several key components that contribute to it's success:



    1. A Monte Carlo Tree Search Algorithm that allows it to better search and learn from the state space of Go

    2. A Deep Neural Network architecture that learns the value and policies of given states, to better inform the MCTS.

    My question is, how is this Reinforcement Learning? Or rather, what aspects of this algorithm specifically make it a Reinforcement Learning problem? Couldn't this just be considered a Supervised Learning problem?










    share|improve this question









    $endgroup$














      1












      1








      1





      $begingroup$


      AlphaGo Zero (https://deepmind.com/blog/alphago-zero-learning-scratch/) has several key components that contribute to it's success:



      1. A Monte Carlo Tree Search Algorithm that allows it to better search and learn from the state space of Go

      2. A Deep Neural Network architecture that learns the value and policies of given states, to better inform the MCTS.

      My question is, how is this Reinforcement Learning? Or rather, what aspects of this algorithm specifically make it a Reinforcement Learning problem? Couldn't this just be considered a Supervised Learning problem?










      share|improve this question









      $endgroup$




      AlphaGo Zero (https://deepmind.com/blog/alphago-zero-learning-scratch/) has several key components that contribute to it's success:



      1. A Monte Carlo Tree Search Algorithm that allows it to better search and learn from the state space of Go

      2. A Deep Neural Network architecture that learns the value and policies of given states, to better inform the MCTS.

      My question is, how is this Reinforcement Learning? Or rather, what aspects of this algorithm specifically make it a Reinforcement Learning problem? Couldn't this just be considered a Supervised Learning problem?







      reinforcement-learning monte-carlo-tree-search supervised-learning alphago-zero go






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 8 hours ago









      SeeDerekEngineerSeeDerekEngineer

      1978




      1978




















          1 Answer
          1






          active

          oldest

          votes


















          3












          $begingroup$

          If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.



          RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.






          share|improve this answer











          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "658"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12739%2fhow-does-alphago-zero-implement-reinforcement-learning%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3












            $begingroup$

            If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.



            RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.






            share|improve this answer











            $endgroup$

















              3












              $begingroup$

              If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.



              RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.






              share|improve this answer











              $endgroup$















                3












                3








                3





                $begingroup$

                If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.



                RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.






                share|improve this answer











                $endgroup$



                If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.



                RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited 8 hours ago

























                answered 8 hours ago









                nbronbro

                3,9862827




                3,9862827



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Artificial Intelligence Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12739%2fhow-does-alphago-zero-implement-reinforcement-learning%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

                    Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

                    Tom Holland Mục lục Đầu đời và giáo dục | Sự nghiệp | Cuộc sống cá nhân | Phim tham gia | Giải thưởng và đề cử | Chú thích | Liên kết ngoài | Trình đơn chuyển hướngProfile“Person Details for Thomas Stanley Holland, "England and Wales Birth Registration Index, 1837-2008" — FamilySearch.org”"Meet Tom Holland... the 16-year-old star of The Impossible""Schoolboy actor Tom Holland finds himself in Oscar contention for role in tsunami drama"“Naomi Watts on the Prince William and Harry's reaction to her film about the late Princess Diana”lưu trữ"Holland and Pflueger Are West End's Two New 'Billy Elliots'""I'm so envious of my son, the movie star! British writer Dominic Holland's spent 20 years trying to crack Hollywood - but he's been beaten to it by a very unlikely rival"“Richard and Margaret Povey of Jersey, Channel Islands, UK: Information about Thomas Stanley Holland”"Tom Holland to play Billy Elliot""New Billy Elliot leaving the garage"Billy Elliot the Musical - Tom Holland - Billy"A Tale of four Billys: Tom Holland""The Feel Good Factor""Thames Christian College schoolboys join Myleene Klass for The Feelgood Factor""Government launches £600,000 arts bursaries pilot""BILLY's Chapman, Holland, Gardner & Jackson-Keen Visit Prime Minister""Elton John 'blown away' by Billy Elliot fifth birthday" (video with John's interview and fragments of Holland's performance)"First News interviews Arrietty's Tom Holland"“33rd Critics' Circle Film Awards winners”“National Board of Review Current Awards”Bản gốc"Ron Howard Whaling Tale 'In The Heart Of The Sea' Casts Tom Holland"“'Spider-Man' Finds Tom Holland to Star as New Web-Slinger”lưu trữ“Captain America: Civil War (2016)”“Film Review: ‘Captain America: Civil War’”lưu trữ“‘Captain America: Civil War’ review: Choose your own avenger”lưu trữ“The Lost City of Z reviews”“Sony Pictures and Marvel Studios Find Their 'Spider-Man' Star and Director”“‘Mary Magdalene’, ‘Current War’ & ‘Wind River’ Get 2017 Release Dates From Weinstein”“Lionsgate Unleashing Daisy Ridley & Tom Holland Starrer ‘Chaos Walking’ In Cannes”“PTA's 'Master' Leads Chicago Film Critics Nominations, UPDATED: Houston and Indiana Critics Nominations”“Nominaciones Goya 2013 Telecinco Cinema – ENG”“Jameson Empire Film Awards: Martin Freeman wins best actor for performance in The Hobbit”“34th Annual Young Artist Awards”Bản gốc“Teen Choice Awards 2016—Captain America: Civil War Leads Second Wave of Nominations”“BAFTA Film Award Nominations: ‘La La Land’ Leads Race”“Saturn Awards Nominations 2017: 'Rogue One,' 'Walking Dead' Lead”Tom HollandTom HollandTom HollandTom Hollandmedia.gettyimages.comWorldCat Identities300279794no20130442900000 0004 0355 42791085670554170004732cb16706349t(data)XX5557367