How Does AlphaGo Zero Implement Reinforcement Learning?What is self-supervised learning in machine learning?What is experience replay in laymen's terms?AlphaZero chess algorithm, Monte Carlo searchHow to implement a contextual reinforcement learning model?How to implement a constrained action space in reinforcement learning?Would AlphaGo Zero become perfect with enough training time?How does reinforcement learning handle measured disturbances?Rollout algorithm like Monte Carlo search suggest model based reinforcement learning?What is the difference between DQN and AlphaGo Zero?How do the achievements met in the gaming field (ex. AlphaGo Zero) impact other fields of application?Feature Selection using Monte Carlo Tree SearchHow can alpha zero learn if the tree search stops and restarts before finishing a game?
If there's something that implicates the president why is there then a national security issue? (John Dowd)
A map of non-pathological topology?
What would be the way to say "just saying" in German? (Not the literal translation)
Why can I traceroute to this IP address, but not ping?
Why can my keyboard only digest 6 keypresses at a time?
How do free-speech protections in the United States apply in public to corporate misrepresentations?
Proving that a Russian cryptographic standard is too structured
What aircraft was used as Air Force One for the flight between Southampton and Shannon?
What is the color of artificial intelligence?
Are inverted question and exclamation mark supposed to be symmetrical to the "normal" counter-parts?
How to trick the reader into thinking they're following a redshirt instead of the protagonist?
How creative should the DM let an artificer be in terms of what they can build?
Solve Riddle With Algebra
I have a problematic assistant manager, but I can't fire him
Is it possible to have 2 different but equal size real number sets that have the same mean and standard deviation?
How to safely destroy (a large quantity of) valid checks?
How to publish items after pipeline is finished?
Are polynomials with the same roots identical?
Why are MBA programs closing?
Is it possible to fly backward if you have really strong headwind?
What does 思ってやっている mean?
Bb13b9 confusion
Why does this query, missing a FROM clause, not error out?
Does putting salt first make it easier for attacker to bruteforce the hash?
How Does AlphaGo Zero Implement Reinforcement Learning?
What is self-supervised learning in machine learning?What is experience replay in laymen's terms?AlphaZero chess algorithm, Monte Carlo searchHow to implement a contextual reinforcement learning model?How to implement a constrained action space in reinforcement learning?Would AlphaGo Zero become perfect with enough training time?How does reinforcement learning handle measured disturbances?Rollout algorithm like Monte Carlo search suggest model based reinforcement learning?What is the difference between DQN and AlphaGo Zero?How do the achievements met in the gaming field (ex. AlphaGo Zero) impact other fields of application?Feature Selection using Monte Carlo Tree SearchHow can alpha zero learn if the tree search stops and restarts before finishing a game?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
AlphaGo Zero (https://deepmind.com/blog/alphago-zero-learning-scratch/) has several key components that contribute to it's success:
- A Monte Carlo Tree Search Algorithm that allows it to better search and learn from the state space of Go
- A Deep Neural Network architecture that learns the value and policies of given states, to better inform the MCTS.
My question is, how is this Reinforcement Learning? Or rather, what aspects of this algorithm specifically make it a Reinforcement Learning problem? Couldn't this just be considered a Supervised Learning problem?
reinforcement-learning monte-carlo-tree-search supervised-learning alphago-zero go
$endgroup$
add a comment |
$begingroup$
AlphaGo Zero (https://deepmind.com/blog/alphago-zero-learning-scratch/) has several key components that contribute to it's success:
- A Monte Carlo Tree Search Algorithm that allows it to better search and learn from the state space of Go
- A Deep Neural Network architecture that learns the value and policies of given states, to better inform the MCTS.
My question is, how is this Reinforcement Learning? Or rather, what aspects of this algorithm specifically make it a Reinforcement Learning problem? Couldn't this just be considered a Supervised Learning problem?
reinforcement-learning monte-carlo-tree-search supervised-learning alphago-zero go
$endgroup$
add a comment |
$begingroup$
AlphaGo Zero (https://deepmind.com/blog/alphago-zero-learning-scratch/) has several key components that contribute to it's success:
- A Monte Carlo Tree Search Algorithm that allows it to better search and learn from the state space of Go
- A Deep Neural Network architecture that learns the value and policies of given states, to better inform the MCTS.
My question is, how is this Reinforcement Learning? Or rather, what aspects of this algorithm specifically make it a Reinforcement Learning problem? Couldn't this just be considered a Supervised Learning problem?
reinforcement-learning monte-carlo-tree-search supervised-learning alphago-zero go
$endgroup$
AlphaGo Zero (https://deepmind.com/blog/alphago-zero-learning-scratch/) has several key components that contribute to it's success:
- A Monte Carlo Tree Search Algorithm that allows it to better search and learn from the state space of Go
- A Deep Neural Network architecture that learns the value and policies of given states, to better inform the MCTS.
My question is, how is this Reinforcement Learning? Or rather, what aspects of this algorithm specifically make it a Reinforcement Learning problem? Couldn't this just be considered a Supervised Learning problem?
reinforcement-learning monte-carlo-tree-search supervised-learning alphago-zero go
reinforcement-learning monte-carlo-tree-search supervised-learning alphago-zero go
asked 8 hours ago
SeeDerekEngineerSeeDerekEngineer
1978
1978
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.
RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "658"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12739%2fhow-does-alphago-zero-implement-reinforcement-learning%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.
RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.
$endgroup$
add a comment |
$begingroup$
If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.
RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.
$endgroup$
add a comment |
$begingroup$
If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.
RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.
$endgroup$
If you learn a policy or a value function from experience (that is, interaction with an environment), that's RL. In the case of AlphaGo, the MCTS is used to acquire the experience.
RL could in fact be considered supervised learning (SL) or, more specifically, self-supervised learning, where the experience corresponds to the labels in SL, especially nowadays with techniques like experience replay.
edited 8 hours ago
answered 8 hours ago
nbronbro
3,9862827
3,9862827
add a comment |
add a comment |
Thanks for contributing an answer to Artificial Intelligence Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12739%2fhow-does-alphago-zero-implement-reinforcement-learning%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown