Would this neural network have short term memory?Arbitrarily big neural networkWhy would neural networks be a particularly good framework for “embodied AI”?If a neural network approach becomes widely used within a real-world situation, how would one debug/understand/fix the outcome if in one case poor?Detect street and sidewalk surface in aerial imagery (neural network)When do you back-propagate errors through a Neural Network when using TD LambdaHow to create a task-graph based neural network?Why not teach to a NN not only what is true, but also what is not true?Neural Network for Optical Mark Recognition?Using an 'operation ID' as a neural network inputWould this NN for my chip outputs work?

How much were the LMs maneuvered to their landing points?

Why is drive/partition number still used?

The Sword in the Stone

Word for showing a small part of something briefly to hint to its existence or beauty without fully uncovering it

What does "see" in "the Holy See" mean?

Is it legal for private citizens to "impound" e-scooters?

Is it normal practice to screen share with a client?

Are there any examples of technologies have been lost over time?

Can I make a matrix from just a parts of the cells?

"I you already know": is this proper English?

Inadvertently nuked my disk permission structure - why?

Trying to build a function to compute divided difference for arbitrary list of points

Anybody know what this small Nintendo stand is for?

How do I address my Catering staff subordinate seen eating from a chafing dish before the customers?

Is there a reason why I should not use the HaveIBeenPwned API to warn users about exposed passwords?

Terence Tao–type books in other fields?

Is a fighting a fallen friend with the help of a redeemed villain story too much for one book

How to Create an Image for Cantor's *Diagonal Argument* with a Diagonal Oval

Easy way to add a zero to the filename if it need it

Basic Questions on Wiener Filtering

Easy emoji puzzle

How can I create a pattern of parallel lines that are increasing in distance in Photoshop / Illustrator?

Why did Saturn V not head straight to the moon?

What to do when you reach a conclusion and find out later on that someone else already did?



Would this neural network have short term memory?


Arbitrarily big neural networkWhy would neural networks be a particularly good framework for “embodied AI”?If a neural network approach becomes widely used within a real-world situation, how would one debug/understand/fix the outcome if in one case poor?Detect street and sidewalk surface in aerial imagery (neural network)When do you back-propagate errors through a Neural Network when using TD LambdaHow to create a task-graph based neural network?Why not teach to a NN not only what is true, but also what is not true?Neural Network for Optical Mark Recognition?Using an 'operation ID' as a neural network inputWould this NN for my chip outputs work?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1












$begingroup$


I want to design a NN that can remember it's last 7 actions and use them as inputs. So for example it would be able to store words in it's memory. Therefore if it had a choice of 10 different actions, the number of words it could store is $10^7$.



Here is my design:



$$out_n+1 = f(out_n, in_n)mathbfN + out_n.mathbfM$$



$$action_n = sigma(mathbfN cdot out_n)$$



Where $f$ represents some layered neural network. Some of the actions would be physical actions and some might be internal (such as thinking of the letter 'C').



Basically I want $out_n$ to be an array that keeps the last 6 action values and puts them back in. So $M$ will be the matrix:



$$beginbmatrix
0&1&0&0&0&0\
0&0&1&0&0&0\
0&0&0&1&0&0\
0&0&0&0&1&0\
0&0&0&0&0&1\
0&0&0&0&0&0
endbmatrix$$



i.e. it would drop the 6th item from it's memory.



and $N$ would be the vector:



$$beginbmatrix
1&0&0&0&0&0&0
endbmatrix$$



I think this would be equivalent to an equation of the form:



$$out_n+1=F(in_n,out_n,out_n-1,out_n-2,...,out_n-6)$$



So I think this would be an advantage over an RNN since this model remembers precisely it's last 6 actions. But would this be better than an RNN or worse? One could increase it's memory to more than 7 quite easily.



I think it's basically the same archececture as an RNN except elinimating a lot of the connections. Is this a new design or a common design?



One problem with this design is that you might also want a memory that is over longer time periods (e.g. for actions that take more than one tick.) But that might be solved by enhancing the archecture.










share|improve this question











$endgroup$


















    1












    $begingroup$


    I want to design a NN that can remember it's last 7 actions and use them as inputs. So for example it would be able to store words in it's memory. Therefore if it had a choice of 10 different actions, the number of words it could store is $10^7$.



    Here is my design:



    $$out_n+1 = f(out_n, in_n)mathbfN + out_n.mathbfM$$



    $$action_n = sigma(mathbfN cdot out_n)$$



    Where $f$ represents some layered neural network. Some of the actions would be physical actions and some might be internal (such as thinking of the letter 'C').



    Basically I want $out_n$ to be an array that keeps the last 6 action values and puts them back in. So $M$ will be the matrix:



    $$beginbmatrix
    0&1&0&0&0&0\
    0&0&1&0&0&0\
    0&0&0&1&0&0\
    0&0&0&0&1&0\
    0&0&0&0&0&1\
    0&0&0&0&0&0
    endbmatrix$$



    i.e. it would drop the 6th item from it's memory.



    and $N$ would be the vector:



    $$beginbmatrix
    1&0&0&0&0&0&0
    endbmatrix$$



    I think this would be equivalent to an equation of the form:



    $$out_n+1=F(in_n,out_n,out_n-1,out_n-2,...,out_n-6)$$



    So I think this would be an advantage over an RNN since this model remembers precisely it's last 6 actions. But would this be better than an RNN or worse? One could increase it's memory to more than 7 quite easily.



    I think it's basically the same archececture as an RNN except elinimating a lot of the connections. Is this a new design or a common design?



    One problem with this design is that you might also want a memory that is over longer time periods (e.g. for actions that take more than one tick.) But that might be solved by enhancing the archecture.










    share|improve this question











    $endgroup$














      1












      1








      1





      $begingroup$


      I want to design a NN that can remember it's last 7 actions and use them as inputs. So for example it would be able to store words in it's memory. Therefore if it had a choice of 10 different actions, the number of words it could store is $10^7$.



      Here is my design:



      $$out_n+1 = f(out_n, in_n)mathbfN + out_n.mathbfM$$



      $$action_n = sigma(mathbfN cdot out_n)$$



      Where $f$ represents some layered neural network. Some of the actions would be physical actions and some might be internal (such as thinking of the letter 'C').



      Basically I want $out_n$ to be an array that keeps the last 6 action values and puts them back in. So $M$ will be the matrix:



      $$beginbmatrix
      0&1&0&0&0&0\
      0&0&1&0&0&0\
      0&0&0&1&0&0\
      0&0&0&0&1&0\
      0&0&0&0&0&1\
      0&0&0&0&0&0
      endbmatrix$$



      i.e. it would drop the 6th item from it's memory.



      and $N$ would be the vector:



      $$beginbmatrix
      1&0&0&0&0&0&0
      endbmatrix$$



      I think this would be equivalent to an equation of the form:



      $$out_n+1=F(in_n,out_n,out_n-1,out_n-2,...,out_n-6)$$



      So I think this would be an advantage over an RNN since this model remembers precisely it's last 6 actions. But would this be better than an RNN or worse? One could increase it's memory to more than 7 quite easily.



      I think it's basically the same archececture as an RNN except elinimating a lot of the connections. Is this a new design or a common design?



      One problem with this design is that you might also want a memory that is over longer time periods (e.g. for actions that take more than one tick.) But that might be solved by enhancing the archecture.










      share|improve this question











      $endgroup$




      I want to design a NN that can remember it's last 7 actions and use them as inputs. So for example it would be able to store words in it's memory. Therefore if it had a choice of 10 different actions, the number of words it could store is $10^7$.



      Here is my design:



      $$out_n+1 = f(out_n, in_n)mathbfN + out_n.mathbfM$$



      $$action_n = sigma(mathbfN cdot out_n)$$



      Where $f$ represents some layered neural network. Some of the actions would be physical actions and some might be internal (such as thinking of the letter 'C').



      Basically I want $out_n$ to be an array that keeps the last 6 action values and puts them back in. So $M$ will be the matrix:



      $$beginbmatrix
      0&1&0&0&0&0\
      0&0&1&0&0&0\
      0&0&0&1&0&0\
      0&0&0&0&1&0\
      0&0&0&0&0&1\
      0&0&0&0&0&0
      endbmatrix$$



      i.e. it would drop the 6th item from it's memory.



      and $N$ would be the vector:



      $$beginbmatrix
      1&0&0&0&0&0&0
      endbmatrix$$



      I think this would be equivalent to an equation of the form:



      $$out_n+1=F(in_n,out_n,out_n-1,out_n-2,...,out_n-6)$$



      So I think this would be an advantage over an RNN since this model remembers precisely it's last 6 actions. But would this be better than an RNN or worse? One could increase it's memory to more than 7 quite easily.



      I think it's basically the same archececture as an RNN except elinimating a lot of the connections. Is this a new design or a common design?



      One problem with this design is that you might also want a memory that is over longer time periods (e.g. for actions that take more than one tick.) But that might be solved by enhancing the archecture.







      neural-networks long-short-term-memory






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 6 hours ago







      zooby

















      asked 8 hours ago









      zoobyzooby

      6564 silver badges12 bronze badges




      6564 silver badges12 bronze badges




















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$

          Congrats, you have invented 1d convolution. Convolution combined with RNN would have some advantage over just RNN. Think about the perception field.
          In this layer, you do aggregate $6$ values to one. Imagine two of them - it will be $36$ already, etc. But, in the end, you still need RNN at the end to aggregate a variable length to constant length.






          share|improve this answer











          $endgroup$












          • $begingroup$
            Well that's good! Glad I'm on the right track! (Not sure what you mean at the end about variable lengths).
            $endgroup$
            – zooby
            7 hours ago










          • $begingroup$
            @zooby This is not a 1D CNN, its a non differentiable RNN. (actions must be sampled under some categorical distribution based on whats described). The only similarity to a 1d cnn is the sliding window
            $endgroup$
            – mshlis
            7 hours ago










          • $begingroup$
            Why is it non-differentiable ?
            $endgroup$
            – zooby
            7 hours ago











          • $begingroup$
            do you train with sequences of different lenght, right? also if you put output as input think about output may be wrong so you can consider to force-feeding ( expected data instead of output)
            $endgroup$
            – user8426627
            6 hours ago










          • $begingroup$
            I could be wrong but generally actions are drawn from a distribution (that’s why you show one hot encodingns) and you can’t differentiate through a categorical distrib
            $endgroup$
            – mshlis
            6 hours ago













          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "658"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f13622%2fwould-this-neural-network-have-short-term-memory%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2












          $begingroup$

          Congrats, you have invented 1d convolution. Convolution combined with RNN would have some advantage over just RNN. Think about the perception field.
          In this layer, you do aggregate $6$ values to one. Imagine two of them - it will be $36$ already, etc. But, in the end, you still need RNN at the end to aggregate a variable length to constant length.






          share|improve this answer











          $endgroup$












          • $begingroup$
            Well that's good! Glad I'm on the right track! (Not sure what you mean at the end about variable lengths).
            $endgroup$
            – zooby
            7 hours ago










          • $begingroup$
            @zooby This is not a 1D CNN, its a non differentiable RNN. (actions must be sampled under some categorical distribution based on whats described). The only similarity to a 1d cnn is the sliding window
            $endgroup$
            – mshlis
            7 hours ago










          • $begingroup$
            Why is it non-differentiable ?
            $endgroup$
            – zooby
            7 hours ago











          • $begingroup$
            do you train with sequences of different lenght, right? also if you put output as input think about output may be wrong so you can consider to force-feeding ( expected data instead of output)
            $endgroup$
            – user8426627
            6 hours ago










          • $begingroup$
            I could be wrong but generally actions are drawn from a distribution (that’s why you show one hot encodingns) and you can’t differentiate through a categorical distrib
            $endgroup$
            – mshlis
            6 hours ago















          2












          $begingroup$

          Congrats, you have invented 1d convolution. Convolution combined with RNN would have some advantage over just RNN. Think about the perception field.
          In this layer, you do aggregate $6$ values to one. Imagine two of them - it will be $36$ already, etc. But, in the end, you still need RNN at the end to aggregate a variable length to constant length.






          share|improve this answer











          $endgroup$












          • $begingroup$
            Well that's good! Glad I'm on the right track! (Not sure what you mean at the end about variable lengths).
            $endgroup$
            – zooby
            7 hours ago










          • $begingroup$
            @zooby This is not a 1D CNN, its a non differentiable RNN. (actions must be sampled under some categorical distribution based on whats described). The only similarity to a 1d cnn is the sliding window
            $endgroup$
            – mshlis
            7 hours ago










          • $begingroup$
            Why is it non-differentiable ?
            $endgroup$
            – zooby
            7 hours ago











          • $begingroup$
            do you train with sequences of different lenght, right? also if you put output as input think about output may be wrong so you can consider to force-feeding ( expected data instead of output)
            $endgroup$
            – user8426627
            6 hours ago










          • $begingroup$
            I could be wrong but generally actions are drawn from a distribution (that’s why you show one hot encodingns) and you can’t differentiate through a categorical distrib
            $endgroup$
            – mshlis
            6 hours ago













          2












          2








          2





          $begingroup$

          Congrats, you have invented 1d convolution. Convolution combined with RNN would have some advantage over just RNN. Think about the perception field.
          In this layer, you do aggregate $6$ values to one. Imagine two of them - it will be $36$ already, etc. But, in the end, you still need RNN at the end to aggregate a variable length to constant length.






          share|improve this answer











          $endgroup$



          Congrats, you have invented 1d convolution. Convolution combined with RNN would have some advantage over just RNN. Think about the perception field.
          In this layer, you do aggregate $6$ values to one. Imagine two of them - it will be $36$ already, etc. But, in the end, you still need RNN at the end to aggregate a variable length to constant length.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 4 hours ago









          nbro

          5,6604 gold badges15 silver badges32 bronze badges




          5,6604 gold badges15 silver badges32 bronze badges










          answered 7 hours ago









          user8426627user8426627

          22411 bronze badges




          22411 bronze badges











          • $begingroup$
            Well that's good! Glad I'm on the right track! (Not sure what you mean at the end about variable lengths).
            $endgroup$
            – zooby
            7 hours ago










          • $begingroup$
            @zooby This is not a 1D CNN, its a non differentiable RNN. (actions must be sampled under some categorical distribution based on whats described). The only similarity to a 1d cnn is the sliding window
            $endgroup$
            – mshlis
            7 hours ago










          • $begingroup$
            Why is it non-differentiable ?
            $endgroup$
            – zooby
            7 hours ago











          • $begingroup$
            do you train with sequences of different lenght, right? also if you put output as input think about output may be wrong so you can consider to force-feeding ( expected data instead of output)
            $endgroup$
            – user8426627
            6 hours ago










          • $begingroup$
            I could be wrong but generally actions are drawn from a distribution (that’s why you show one hot encodingns) and you can’t differentiate through a categorical distrib
            $endgroup$
            – mshlis
            6 hours ago
















          • $begingroup$
            Well that's good! Glad I'm on the right track! (Not sure what you mean at the end about variable lengths).
            $endgroup$
            – zooby
            7 hours ago










          • $begingroup$
            @zooby This is not a 1D CNN, its a non differentiable RNN. (actions must be sampled under some categorical distribution based on whats described). The only similarity to a 1d cnn is the sliding window
            $endgroup$
            – mshlis
            7 hours ago










          • $begingroup$
            Why is it non-differentiable ?
            $endgroup$
            – zooby
            7 hours ago











          • $begingroup$
            do you train with sequences of different lenght, right? also if you put output as input think about output may be wrong so you can consider to force-feeding ( expected data instead of output)
            $endgroup$
            – user8426627
            6 hours ago










          • $begingroup$
            I could be wrong but generally actions are drawn from a distribution (that’s why you show one hot encodingns) and you can’t differentiate through a categorical distrib
            $endgroup$
            – mshlis
            6 hours ago















          $begingroup$
          Well that's good! Glad I'm on the right track! (Not sure what you mean at the end about variable lengths).
          $endgroup$
          – zooby
          7 hours ago




          $begingroup$
          Well that's good! Glad I'm on the right track! (Not sure what you mean at the end about variable lengths).
          $endgroup$
          – zooby
          7 hours ago












          $begingroup$
          @zooby This is not a 1D CNN, its a non differentiable RNN. (actions must be sampled under some categorical distribution based on whats described). The only similarity to a 1d cnn is the sliding window
          $endgroup$
          – mshlis
          7 hours ago




          $begingroup$
          @zooby This is not a 1D CNN, its a non differentiable RNN. (actions must be sampled under some categorical distribution based on whats described). The only similarity to a 1d cnn is the sliding window
          $endgroup$
          – mshlis
          7 hours ago












          $begingroup$
          Why is it non-differentiable ?
          $endgroup$
          – zooby
          7 hours ago





          $begingroup$
          Why is it non-differentiable ?
          $endgroup$
          – zooby
          7 hours ago













          $begingroup$
          do you train with sequences of different lenght, right? also if you put output as input think about output may be wrong so you can consider to force-feeding ( expected data instead of output)
          $endgroup$
          – user8426627
          6 hours ago




          $begingroup$
          do you train with sequences of different lenght, right? also if you put output as input think about output may be wrong so you can consider to force-feeding ( expected data instead of output)
          $endgroup$
          – user8426627
          6 hours ago












          $begingroup$
          I could be wrong but generally actions are drawn from a distribution (that’s why you show one hot encodingns) and you can’t differentiate through a categorical distrib
          $endgroup$
          – mshlis
          6 hours ago




          $begingroup$
          I could be wrong but generally actions are drawn from a distribution (that’s why you show one hot encodingns) and you can’t differentiate through a categorical distrib
          $endgroup$
          – mshlis
          6 hours ago

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Artificial Intelligence Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f13622%2fwould-this-neural-network-have-short-term-memory%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

          Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

          Ласкавець круглолистий Зміст Опис | Поширення | Галерея | Примітки | Посилання | Навігаційне меню58171138361-22960890446Bupleurum rotundifoliumEuro+Med PlantbasePlants of the World Online — Kew ScienceGermplasm Resources Information Network (GRIN)Ласкавецькн. VI : Літери Ком — Левиправивши або дописавши її