Is the Keras Embedding layer dependent on the target label?How does Keras 'Embedding' layer work?Should word embedding vectors be normalized before being used as inputs?deep learning - word embedding with parts of speechRandomly initialized embedding matrixHow to use Keras pre-trained 'Embedding' layer?How the embedding layer is trained in Keras Embedding layerDimension reduction - word embeddings as inputs for a time series model (LSTM)What is difference between keras embedding layer and word2vec?Learning image embeddings using VGG and Word2VecCan an embedding layer be replaced by a fully connected layer?Autoencoder keeping constant vector as predict in keras

Should I put programming books I wrote a few years ago on my resume?

How can I remove material from this wood beam?

How far would a landing Airbus A380 go until it stops with no brakes?

Does the Nuka-Cola bottler actually generate nuka cola?

What are the unintended or dangerous consequences of allowing spells that target and damage creatures to also target and damage objects?

Why are MBA programs closing in the United States?

NUL delimited variable

Who is "He that flies" in Lord of the Rings?

Razzle Dazzle simulator

Command of files and size

So a part of my house disappeared... But not because of a chunk resetting

Do you have to have figures when playing D&D?

Converting from CMYK to RGB (to work with it), then back to CMYK

Why isn't Bash trap working if output is redirected to stdout?

Housemarks (superimposed & combined letters, heraldry)

What differences exist between adamantine and adamantite in all editions of D&D?

Does the new finding on "reversing a quantum jump mid-flight" rule out any interpretations of QM?

Do empty drive bays need to be filled?

If there's something that implicates the president why is there then a national security issue? (John Dowd)

Can the removal of a duty-free sales trolley result in a measurable reduction in emissions?

Rail-to-rail op-amp only reaches 90% of VCC, works sometimes, not everytime

What is the logic behind charging tax _in the form of money_ for owning property when the property does not produce money?

Trying to get (more) accurate readings from thermistor (electronics, math, and code inside)

Suppose leased car is totalled: what are financial implications?



Is the Keras Embedding layer dependent on the target label?


How does Keras 'Embedding' layer work?Should word embedding vectors be normalized before being used as inputs?deep learning - word embedding with parts of speechRandomly initialized embedding matrixHow to use Keras pre-trained 'Embedding' layer?How the embedding layer is trained in Keras Embedding layerDimension reduction - word embeddings as inputs for a time series model (LSTM)What is difference between keras embedding layer and word2vec?Learning image embeddings using VGG and Word2VecCan an embedding layer be replaced by a fully connected layer?Autoencoder keeping constant vector as predict in keras






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1












$begingroup$


I learned how to 'use' the Keras Embedding layer, but I am not able to find any more specific information about the actual behavior and training process of this layer. For now, I understand that the Keras Embedding layer maps distinct categorical features to n-dimensional vectors, which allows us to find, for example, how similar two features are.



What I do not understand is how these vectors in the embedding layer are trained. Here is an explanation where there is information that these vectors are not computed with any operation, but working only as a lookup table, but I always thought that they are somehow "trained" to find similarities between distinct features.



If they are trained, are they trained from target labels, or from the order in which they appear (similar to GloVe, word2vec, etc.) or from both?



I have the following example of two pairs of rows in a dataset. y is the model target label and X are the features encoded to integers to be used in the embedding layer:



#pair 1 
dataset_y_row1 = [1]
dataset_y_row2 = [0]
dataset_X_row1 = [3,5,8,45,2]
dataset_X_row2 = [3,5,8,45,2]

#pair 2
dataset_y_row3 = [1]
dataset_y_row4 = [1]
dataset_X_row3 = [3,5,8,45,2]
dataset_X_row4 = [3,5,45,8,2]


My questions are the following:



  1. Will the embedding layer see any difference between rows 1 and 2 (i.e. is
    it 'target-label-sensitive')?

  2. Will the embedding layer see any difference between rows 3 and 4 (i.e. is it sensitive to order of features like word2vec, GloVe, etc.)?









share|cite|improve this question











$endgroup$


















    1












    $begingroup$


    I learned how to 'use' the Keras Embedding layer, but I am not able to find any more specific information about the actual behavior and training process of this layer. For now, I understand that the Keras Embedding layer maps distinct categorical features to n-dimensional vectors, which allows us to find, for example, how similar two features are.



    What I do not understand is how these vectors in the embedding layer are trained. Here is an explanation where there is information that these vectors are not computed with any operation, but working only as a lookup table, but I always thought that they are somehow "trained" to find similarities between distinct features.



    If they are trained, are they trained from target labels, or from the order in which they appear (similar to GloVe, word2vec, etc.) or from both?



    I have the following example of two pairs of rows in a dataset. y is the model target label and X are the features encoded to integers to be used in the embedding layer:



    #pair 1 
    dataset_y_row1 = [1]
    dataset_y_row2 = [0]
    dataset_X_row1 = [3,5,8,45,2]
    dataset_X_row2 = [3,5,8,45,2]

    #pair 2
    dataset_y_row3 = [1]
    dataset_y_row4 = [1]
    dataset_X_row3 = [3,5,8,45,2]
    dataset_X_row4 = [3,5,45,8,2]


    My questions are the following:



    1. Will the embedding layer see any difference between rows 1 and 2 (i.e. is
      it 'target-label-sensitive')?

    2. Will the embedding layer see any difference between rows 3 and 4 (i.e. is it sensitive to order of features like word2vec, GloVe, etc.)?









    share|cite|improve this question











    $endgroup$














      1












      1








      1





      $begingroup$


      I learned how to 'use' the Keras Embedding layer, but I am not able to find any more specific information about the actual behavior and training process of this layer. For now, I understand that the Keras Embedding layer maps distinct categorical features to n-dimensional vectors, which allows us to find, for example, how similar two features are.



      What I do not understand is how these vectors in the embedding layer are trained. Here is an explanation where there is information that these vectors are not computed with any operation, but working only as a lookup table, but I always thought that they are somehow "trained" to find similarities between distinct features.



      If they are trained, are they trained from target labels, or from the order in which they appear (similar to GloVe, word2vec, etc.) or from both?



      I have the following example of two pairs of rows in a dataset. y is the model target label and X are the features encoded to integers to be used in the embedding layer:



      #pair 1 
      dataset_y_row1 = [1]
      dataset_y_row2 = [0]
      dataset_X_row1 = [3,5,8,45,2]
      dataset_X_row2 = [3,5,8,45,2]

      #pair 2
      dataset_y_row3 = [1]
      dataset_y_row4 = [1]
      dataset_X_row3 = [3,5,8,45,2]
      dataset_X_row4 = [3,5,45,8,2]


      My questions are the following:



      1. Will the embedding layer see any difference between rows 1 and 2 (i.e. is
        it 'target-label-sensitive')?

      2. Will the embedding layer see any difference between rows 3 and 4 (i.e. is it sensitive to order of features like word2vec, GloVe, etc.)?









      share|cite|improve this question











      $endgroup$




      I learned how to 'use' the Keras Embedding layer, but I am not able to find any more specific information about the actual behavior and training process of this layer. For now, I understand that the Keras Embedding layer maps distinct categorical features to n-dimensional vectors, which allows us to find, for example, how similar two features are.



      What I do not understand is how these vectors in the embedding layer are trained. Here is an explanation where there is information that these vectors are not computed with any operation, but working only as a lookup table, but I always thought that they are somehow "trained" to find similarities between distinct features.



      If they are trained, are they trained from target labels, or from the order in which they appear (similar to GloVe, word2vec, etc.) or from both?



      I have the following example of two pairs of rows in a dataset. y is the model target label and X are the features encoded to integers to be used in the embedding layer:



      #pair 1 
      dataset_y_row1 = [1]
      dataset_y_row2 = [0]
      dataset_X_row1 = [3,5,8,45,2]
      dataset_X_row2 = [3,5,8,45,2]

      #pair 2
      dataset_y_row3 = [1]
      dataset_y_row4 = [1]
      dataset_X_row3 = [3,5,8,45,2]
      dataset_X_row4 = [3,5,45,8,2]


      My questions are the following:



      1. Will the embedding layer see any difference between rows 1 and 2 (i.e. is
        it 'target-label-sensitive')?

      2. Will the embedding layer see any difference between rows 3 and 4 (i.e. is it sensitive to order of features like word2vec, GloVe, etc.)?






      neural-networks keras word-embeddings embeddings






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited 6 hours ago









      Mihai Chelaru

      1877




      1877










      asked 10 hours ago









      Jan MusilJan Musil

      354




      354




















          1 Answer
          1






          active

          oldest

          votes


















          3












          $begingroup$

          Embeddings layer for vocabulary of size $m$, that encodes each word into embeddings vector of size $k$ is a shorthand for having the words one-hot encoded using into $m$ features and then putting dense layer with $k$ units over it. Word2vec and GloVe are specialized algorithms for learning the embeddings, but the end product is a matrix of weights that is multiplied by the one-hot encoded words.



          If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .



          To answer your question, one would need to consider what is your network architecture and the data. Algorithms like word2vec and GloVe are trained on language data, to predict things like next word in a sequence. On another hand, if you use the embeddingss layer that is trained from the scratch and used as a part of larger network, that has some utilitarian purpose (e.g. spam detection, sentiment classification), then the layers work as any other dense layers, so they serve purpose of automatic feature engineering. In the latter case, you would expect to see more specialised embeddingss, that would learn features related to the objective of your network.






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
            $endgroup$
            – Jan Musil
            7 hours ago










          • $begingroup$
            @JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
            $endgroup$
            – Tim
            6 hours ago











          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "65"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f412206%2fis-the-keras-embedding-layer-dependent-on-the-target-label%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3












          $begingroup$

          Embeddings layer for vocabulary of size $m$, that encodes each word into embeddings vector of size $k$ is a shorthand for having the words one-hot encoded using into $m$ features and then putting dense layer with $k$ units over it. Word2vec and GloVe are specialized algorithms for learning the embeddings, but the end product is a matrix of weights that is multiplied by the one-hot encoded words.



          If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .



          To answer your question, one would need to consider what is your network architecture and the data. Algorithms like word2vec and GloVe are trained on language data, to predict things like next word in a sequence. On another hand, if you use the embeddingss layer that is trained from the scratch and used as a part of larger network, that has some utilitarian purpose (e.g. spam detection, sentiment classification), then the layers work as any other dense layers, so they serve purpose of automatic feature engineering. In the latter case, you would expect to see more specialised embeddingss, that would learn features related to the objective of your network.






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
            $endgroup$
            – Jan Musil
            7 hours ago










          • $begingroup$
            @JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
            $endgroup$
            – Tim
            6 hours ago















          3












          $begingroup$

          Embeddings layer for vocabulary of size $m$, that encodes each word into embeddings vector of size $k$ is a shorthand for having the words one-hot encoded using into $m$ features and then putting dense layer with $k$ units over it. Word2vec and GloVe are specialized algorithms for learning the embeddings, but the end product is a matrix of weights that is multiplied by the one-hot encoded words.



          If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .



          To answer your question, one would need to consider what is your network architecture and the data. Algorithms like word2vec and GloVe are trained on language data, to predict things like next word in a sequence. On another hand, if you use the embeddingss layer that is trained from the scratch and used as a part of larger network, that has some utilitarian purpose (e.g. spam detection, sentiment classification), then the layers work as any other dense layers, so they serve purpose of automatic feature engineering. In the latter case, you would expect to see more specialised embeddingss, that would learn features related to the objective of your network.






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
            $endgroup$
            – Jan Musil
            7 hours ago










          • $begingroup$
            @JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
            $endgroup$
            – Tim
            6 hours ago













          3












          3








          3





          $begingroup$

          Embeddings layer for vocabulary of size $m$, that encodes each word into embeddings vector of size $k$ is a shorthand for having the words one-hot encoded using into $m$ features and then putting dense layer with $k$ units over it. Word2vec and GloVe are specialized algorithms for learning the embeddings, but the end product is a matrix of weights that is multiplied by the one-hot encoded words.



          If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .



          To answer your question, one would need to consider what is your network architecture and the data. Algorithms like word2vec and GloVe are trained on language data, to predict things like next word in a sequence. On another hand, if you use the embeddingss layer that is trained from the scratch and used as a part of larger network, that has some utilitarian purpose (e.g. spam detection, sentiment classification), then the layers work as any other dense layers, so they serve purpose of automatic feature engineering. In the latter case, you would expect to see more specialised embeddingss, that would learn features related to the objective of your network.






          share|cite|improve this answer











          $endgroup$



          Embeddings layer for vocabulary of size $m$, that encodes each word into embeddings vector of size $k$ is a shorthand for having the words one-hot encoded using into $m$ features and then putting dense layer with $k$ units over it. Word2vec and GloVe are specialized algorithms for learning the embeddings, but the end product is a matrix of weights that is multiplied by the one-hot encoded words.



          If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .



          To answer your question, one would need to consider what is your network architecture and the data. Algorithms like word2vec and GloVe are trained on language data, to predict things like next word in a sequence. On another hand, if you use the embeddingss layer that is trained from the scratch and used as a part of larger network, that has some utilitarian purpose (e.g. spam detection, sentiment classification), then the layers work as any other dense layers, so they serve purpose of automatic feature engineering. In the latter case, you would expect to see more specialised embeddingss, that would learn features related to the objective of your network.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited 6 hours ago

























          answered 9 hours ago









          TimTim

          61.9k9136234




          61.9k9136234







          • 1




            $begingroup$
            okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
            $endgroup$
            – Jan Musil
            7 hours ago










          • $begingroup$
            @JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
            $endgroup$
            – Tim
            6 hours ago












          • 1




            $begingroup$
            okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
            $endgroup$
            – Jan Musil
            7 hours ago










          • $begingroup$
            @JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
            $endgroup$
            – Tim
            6 hours ago







          1




          1




          $begingroup$
          okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
          $endgroup$
          – Jan Musil
          7 hours ago




          $begingroup$
          okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
          $endgroup$
          – Jan Musil
          7 hours ago












          $begingroup$
          @JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
          $endgroup$
          – Tim
          6 hours ago




          $begingroup$
          @JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
          $endgroup$
          – Tim
          6 hours ago

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Cross Validated!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f412206%2fis-the-keras-embedding-layer-dependent-on-the-target-label%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

          Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

          Ласкавець круглолистий Зміст Опис | Поширення | Галерея | Примітки | Посилання | Навігаційне меню58171138361-22960890446Bupleurum rotundifoliumEuro+Med PlantbasePlants of the World Online — Kew ScienceGermplasm Resources Information Network (GRIN)Ласкавецькн. VI : Літери Ком — Левиправивши або дописавши її