How to specify and fit a hybrid machine learning - linear modelCombining several variables into one outcome score: How is it done in the machine learning community?Are visualization techniques useful when the predictive model is a highly flexible machine learning algorithm?Reconciling boosted regression trees (BRT), generalized boosted models (GBM), and gradient boosting machine (GBM)Heteroscedasticity in machine learning predictionsA scenario of developing machine learning modelGeneral form of a machine learning algorithmWould machine learning techniques help if the linear and nonlinear relationships is so weak?Different machine learning models give contradictory results

Are required indicators necessary for radio buttons?

!I!n!s!e!r!t! !n!b!e!t!w!e!e!n!

Why don't we use Cavea-B

Dark side of an exoplanet - if it was earth-like would its surface light be detectable?

A second course in the representation theory

What is the evidence on the danger of feeding whole blueberries and grapes to infants and toddlers?

Why my earth simulation is slower than the reality?

Why didn’t Doctor Strange stay in the original winning timeline?

Taking out number of subarrays from an array which contains all the distinct elements of that array

Something in the TV

Do we need to assume underlying returns are normal in BSM model, given Central Limit Theorem?

Starships without computers?

Is it appropriate for a prospective landlord to ask me for my credit report?

Don't understand MOSFET as amplifier

What are the pros and cons of Einstein-Cartan Theory?

How to "know" if I have a passion?

Vacuum collapse -- why do strong metals implode but glass doesn't?

In an emergency, how do I find and share my position?

Can pay be witheld for hours cleaning up after closing time?

How much code would a codegolf golf if a codegolf could golf code?

How to setup a teletype to a unix shell

Was Switzerland really impossible to invade during WW2?

Is there such a thing as too inconvenient?

Potential new partner angry about first collaboration - how to answer email to close up this encounter in a graceful manner



How to specify and fit a hybrid machine learning - linear model


Combining several variables into one outcome score: How is it done in the machine learning community?Are visualization techniques useful when the predictive model is a highly flexible machine learning algorithm?Reconciling boosted regression trees (BRT), generalized boosted models (GBM), and gradient boosting machine (GBM)Heteroscedasticity in machine learning predictionsA scenario of developing machine learning modelGeneral form of a machine learning algorithmWould machine learning techniques help if the linear and nonlinear relationships is so weak?Different machine learning models give contradictory results






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








3












$begingroup$


I want to understand how some dependent variable y, depends on a known relationship with independent variable x, but also how x potentially interacts with a high dimensional complex set of features (microbiome data which can be represented as thousands of predictors per observation of y). Therefore, I know that the general form of the model is:



y ~ m1*x + b



However I would like to add an interaction term, generating the model:



y ~ m1*x + m2*x*microbiome + b



Where microbiome is the very large microbiome feature set.



I think the microbiome data essentially interacts with the independent variable x to affect y, but I don't know how. There are thousands of species, and my guess is that certain combinations will be predictive of this interaction and explain a lot of variation, but a priori I don't know which. I also suspect that the microbiome features relate to each other in a non-linear way. Essentially I want to use a machine learning approach to figure that out. I am aware of "boosted" regressions, where you use a machine learning algorithm on the residuals, however I want to specify something a bit more mechanistic than that.



If anyone could suggest a method to do something like this (especially if it can be implemented in R) I would be very interested. If there is a way to use the residuals of the model to do this, I would also be interested. It's worth noting that in the actual application I have many more predictors in the model, many of which have non-linear relationships with y.










share|cite|improve this question











$endgroup$













  • $begingroup$
    Why is it not sufficient to specify the interaction in the usual way?
    $endgroup$
    – Sycorax
    8 hours ago










  • $begingroup$
    @Sycorax the microbiome feature set has 1000s of columns, and I don't think any one of the individual columns should have a relationship with y I think its way more likely that different combinations of the features in the microbiome dataset interact with x to predict y, but I don't know which combinations.
    $endgroup$
    – colin
    8 hours ago

















3












$begingroup$


I want to understand how some dependent variable y, depends on a known relationship with independent variable x, but also how x potentially interacts with a high dimensional complex set of features (microbiome data which can be represented as thousands of predictors per observation of y). Therefore, I know that the general form of the model is:



y ~ m1*x + b



However I would like to add an interaction term, generating the model:



y ~ m1*x + m2*x*microbiome + b



Where microbiome is the very large microbiome feature set.



I think the microbiome data essentially interacts with the independent variable x to affect y, but I don't know how. There are thousands of species, and my guess is that certain combinations will be predictive of this interaction and explain a lot of variation, but a priori I don't know which. I also suspect that the microbiome features relate to each other in a non-linear way. Essentially I want to use a machine learning approach to figure that out. I am aware of "boosted" regressions, where you use a machine learning algorithm on the residuals, however I want to specify something a bit more mechanistic than that.



If anyone could suggest a method to do something like this (especially if it can be implemented in R) I would be very interested. If there is a way to use the residuals of the model to do this, I would also be interested. It's worth noting that in the actual application I have many more predictors in the model, many of which have non-linear relationships with y.










share|cite|improve this question











$endgroup$













  • $begingroup$
    Why is it not sufficient to specify the interaction in the usual way?
    $endgroup$
    – Sycorax
    8 hours ago










  • $begingroup$
    @Sycorax the microbiome feature set has 1000s of columns, and I don't think any one of the individual columns should have a relationship with y I think its way more likely that different combinations of the features in the microbiome dataset interact with x to predict y, but I don't know which combinations.
    $endgroup$
    – colin
    8 hours ago













3












3








3


1



$begingroup$


I want to understand how some dependent variable y, depends on a known relationship with independent variable x, but also how x potentially interacts with a high dimensional complex set of features (microbiome data which can be represented as thousands of predictors per observation of y). Therefore, I know that the general form of the model is:



y ~ m1*x + b



However I would like to add an interaction term, generating the model:



y ~ m1*x + m2*x*microbiome + b



Where microbiome is the very large microbiome feature set.



I think the microbiome data essentially interacts with the independent variable x to affect y, but I don't know how. There are thousands of species, and my guess is that certain combinations will be predictive of this interaction and explain a lot of variation, but a priori I don't know which. I also suspect that the microbiome features relate to each other in a non-linear way. Essentially I want to use a machine learning approach to figure that out. I am aware of "boosted" regressions, where you use a machine learning algorithm on the residuals, however I want to specify something a bit more mechanistic than that.



If anyone could suggest a method to do something like this (especially if it can be implemented in R) I would be very interested. If there is a way to use the residuals of the model to do this, I would also be interested. It's worth noting that in the actual application I have many more predictors in the model, many of which have non-linear relationships with y.










share|cite|improve this question











$endgroup$




I want to understand how some dependent variable y, depends on a known relationship with independent variable x, but also how x potentially interacts with a high dimensional complex set of features (microbiome data which can be represented as thousands of predictors per observation of y). Therefore, I know that the general form of the model is:



y ~ m1*x + b



However I would like to add an interaction term, generating the model:



y ~ m1*x + m2*x*microbiome + b



Where microbiome is the very large microbiome feature set.



I think the microbiome data essentially interacts with the independent variable x to affect y, but I don't know how. There are thousands of species, and my guess is that certain combinations will be predictive of this interaction and explain a lot of variation, but a priori I don't know which. I also suspect that the microbiome features relate to each other in a non-linear way. Essentially I want to use a machine learning approach to figure that out. I am aware of "boosted" regressions, where you use a machine learning algorithm on the residuals, however I want to specify something a bit more mechanistic than that.



If anyone could suggest a method to do something like this (especially if it can be implemented in R) I would be very interested. If there is a way to use the residuals of the model to do this, I would also be interested. It's worth noting that in the actual application I have many more predictors in the model, many of which have non-linear relationships with y.







machine-learning boosting






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited 8 hours ago







colin

















asked 8 hours ago









colincolin

3875 silver badges20 bronze badges




3875 silver badges20 bronze badges














  • $begingroup$
    Why is it not sufficient to specify the interaction in the usual way?
    $endgroup$
    – Sycorax
    8 hours ago










  • $begingroup$
    @Sycorax the microbiome feature set has 1000s of columns, and I don't think any one of the individual columns should have a relationship with y I think its way more likely that different combinations of the features in the microbiome dataset interact with x to predict y, but I don't know which combinations.
    $endgroup$
    – colin
    8 hours ago
















  • $begingroup$
    Why is it not sufficient to specify the interaction in the usual way?
    $endgroup$
    – Sycorax
    8 hours ago










  • $begingroup$
    @Sycorax the microbiome feature set has 1000s of columns, and I don't think any one of the individual columns should have a relationship with y I think its way more likely that different combinations of the features in the microbiome dataset interact with x to predict y, but I don't know which combinations.
    $endgroup$
    – colin
    8 hours ago















$begingroup$
Why is it not sufficient to specify the interaction in the usual way?
$endgroup$
– Sycorax
8 hours ago




$begingroup$
Why is it not sufficient to specify the interaction in the usual way?
$endgroup$
– Sycorax
8 hours ago












$begingroup$
@Sycorax the microbiome feature set has 1000s of columns, and I don't think any one of the individual columns should have a relationship with y I think its way more likely that different combinations of the features in the microbiome dataset interact with x to predict y, but I don't know which combinations.
$endgroup$
– colin
8 hours ago




$begingroup$
@Sycorax the microbiome feature set has 1000s of columns, and I don't think any one of the individual columns should have a relationship with y I think its way more likely that different combinations of the features in the microbiome dataset interact with x to predict y, but I don't know which combinations.
$endgroup$
– colin
8 hours ago










1 Answer
1






active

oldest

votes


















4












$begingroup$

The easiest way to implement model like



y ~ m1*x + m2*x*microbiome + b


would be to replace microbiome with a dense neural network



y ~ m1*x + m2*x*nn(microbiome) + b


so that the neural network nn would reduce dimensionality (to single or multiple dimensions, depending of number of units in the output layer) and do the feature engineering for you. The nice part is that it would let you to keep the assumed form of the model, but the neural network would deal with the extra features for you.



This can be easily done in frameworks like Keras, that are designed to deal with large datasets and scale nicely. In Keras, this would translate to something like the model definition below. To understand the code, you would probably need to dive deeper into Keras, but hopefully many tutorials are available online.



from keras.models import Model
from keras.layers import Input, Dense, multiply, concatenate

x_inp = Input(shape=(1,))
microbiome_inp = Input(shape=(k,))

# 3-layer neural network
nn = Dense(200, activation='relu')(microbiome_inp)
nn = Dense(50, activation='relu')(nn)
nn = Dense(1)(nn)

# x*nn(microbiome)
mul = multiply([x_inp, nn])

# m1*x + m2*x*nn(microbiome) + b
conc = concatenate([x_inp, mul])
out = Dense(1)(conc)

model = Model(inputs=[x_inp, microbiome_inp], outputs=out)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit([x, microbiome], y)





share|cite|improve this answer











$endgroup$














  • $begingroup$
    This is exactly what I am talking about, thanks! Can you link to any place that has a tutorial on how to specify and fit a model like this in TensorFlow?
    $endgroup$
    – colin
    8 hours ago













Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f422996%2fhow-to-specify-and-fit-a-hybrid-machine-learning-linear-model%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









4












$begingroup$

The easiest way to implement model like



y ~ m1*x + m2*x*microbiome + b


would be to replace microbiome with a dense neural network



y ~ m1*x + m2*x*nn(microbiome) + b


so that the neural network nn would reduce dimensionality (to single or multiple dimensions, depending of number of units in the output layer) and do the feature engineering for you. The nice part is that it would let you to keep the assumed form of the model, but the neural network would deal with the extra features for you.



This can be easily done in frameworks like Keras, that are designed to deal with large datasets and scale nicely. In Keras, this would translate to something like the model definition below. To understand the code, you would probably need to dive deeper into Keras, but hopefully many tutorials are available online.



from keras.models import Model
from keras.layers import Input, Dense, multiply, concatenate

x_inp = Input(shape=(1,))
microbiome_inp = Input(shape=(k,))

# 3-layer neural network
nn = Dense(200, activation='relu')(microbiome_inp)
nn = Dense(50, activation='relu')(nn)
nn = Dense(1)(nn)

# x*nn(microbiome)
mul = multiply([x_inp, nn])

# m1*x + m2*x*nn(microbiome) + b
conc = concatenate([x_inp, mul])
out = Dense(1)(conc)

model = Model(inputs=[x_inp, microbiome_inp], outputs=out)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit([x, microbiome], y)





share|cite|improve this answer











$endgroup$














  • $begingroup$
    This is exactly what I am talking about, thanks! Can you link to any place that has a tutorial on how to specify and fit a model like this in TensorFlow?
    $endgroup$
    – colin
    8 hours ago















4












$begingroup$

The easiest way to implement model like



y ~ m1*x + m2*x*microbiome + b


would be to replace microbiome with a dense neural network



y ~ m1*x + m2*x*nn(microbiome) + b


so that the neural network nn would reduce dimensionality (to single or multiple dimensions, depending of number of units in the output layer) and do the feature engineering for you. The nice part is that it would let you to keep the assumed form of the model, but the neural network would deal with the extra features for you.



This can be easily done in frameworks like Keras, that are designed to deal with large datasets and scale nicely. In Keras, this would translate to something like the model definition below. To understand the code, you would probably need to dive deeper into Keras, but hopefully many tutorials are available online.



from keras.models import Model
from keras.layers import Input, Dense, multiply, concatenate

x_inp = Input(shape=(1,))
microbiome_inp = Input(shape=(k,))

# 3-layer neural network
nn = Dense(200, activation='relu')(microbiome_inp)
nn = Dense(50, activation='relu')(nn)
nn = Dense(1)(nn)

# x*nn(microbiome)
mul = multiply([x_inp, nn])

# m1*x + m2*x*nn(microbiome) + b
conc = concatenate([x_inp, mul])
out = Dense(1)(conc)

model = Model(inputs=[x_inp, microbiome_inp], outputs=out)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit([x, microbiome], y)





share|cite|improve this answer











$endgroup$














  • $begingroup$
    This is exactly what I am talking about, thanks! Can you link to any place that has a tutorial on how to specify and fit a model like this in TensorFlow?
    $endgroup$
    – colin
    8 hours ago













4












4








4





$begingroup$

The easiest way to implement model like



y ~ m1*x + m2*x*microbiome + b


would be to replace microbiome with a dense neural network



y ~ m1*x + m2*x*nn(microbiome) + b


so that the neural network nn would reduce dimensionality (to single or multiple dimensions, depending of number of units in the output layer) and do the feature engineering for you. The nice part is that it would let you to keep the assumed form of the model, but the neural network would deal with the extra features for you.



This can be easily done in frameworks like Keras, that are designed to deal with large datasets and scale nicely. In Keras, this would translate to something like the model definition below. To understand the code, you would probably need to dive deeper into Keras, but hopefully many tutorials are available online.



from keras.models import Model
from keras.layers import Input, Dense, multiply, concatenate

x_inp = Input(shape=(1,))
microbiome_inp = Input(shape=(k,))

# 3-layer neural network
nn = Dense(200, activation='relu')(microbiome_inp)
nn = Dense(50, activation='relu')(nn)
nn = Dense(1)(nn)

# x*nn(microbiome)
mul = multiply([x_inp, nn])

# m1*x + m2*x*nn(microbiome) + b
conc = concatenate([x_inp, mul])
out = Dense(1)(conc)

model = Model(inputs=[x_inp, microbiome_inp], outputs=out)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit([x, microbiome], y)





share|cite|improve this answer











$endgroup$



The easiest way to implement model like



y ~ m1*x + m2*x*microbiome + b


would be to replace microbiome with a dense neural network



y ~ m1*x + m2*x*nn(microbiome) + b


so that the neural network nn would reduce dimensionality (to single or multiple dimensions, depending of number of units in the output layer) and do the feature engineering for you. The nice part is that it would let you to keep the assumed form of the model, but the neural network would deal with the extra features for you.



This can be easily done in frameworks like Keras, that are designed to deal with large datasets and scale nicely. In Keras, this would translate to something like the model definition below. To understand the code, you would probably need to dive deeper into Keras, but hopefully many tutorials are available online.



from keras.models import Model
from keras.layers import Input, Dense, multiply, concatenate

x_inp = Input(shape=(1,))
microbiome_inp = Input(shape=(k,))

# 3-layer neural network
nn = Dense(200, activation='relu')(microbiome_inp)
nn = Dense(50, activation='relu')(nn)
nn = Dense(1)(nn)

# x*nn(microbiome)
mul = multiply([x_inp, nn])

# m1*x + m2*x*nn(microbiome) + b
conc = concatenate([x_inp, mul])
out = Dense(1)(conc)

model = Model(inputs=[x_inp, microbiome_inp], outputs=out)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit([x, microbiome], y)






share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited 4 hours ago

























answered 8 hours ago









TimTim

63.9k10 gold badges142 silver badges241 bronze badges




63.9k10 gold badges142 silver badges241 bronze badges














  • $begingroup$
    This is exactly what I am talking about, thanks! Can you link to any place that has a tutorial on how to specify and fit a model like this in TensorFlow?
    $endgroup$
    – colin
    8 hours ago
















  • $begingroup$
    This is exactly what I am talking about, thanks! Can you link to any place that has a tutorial on how to specify and fit a model like this in TensorFlow?
    $endgroup$
    – colin
    8 hours ago















$begingroup$
This is exactly what I am talking about, thanks! Can you link to any place that has a tutorial on how to specify and fit a model like this in TensorFlow?
$endgroup$
– colin
8 hours ago




$begingroup$
This is exactly what I am talking about, thanks! Can you link to any place that has a tutorial on how to specify and fit a model like this in TensorFlow?
$endgroup$
– colin
8 hours ago

















draft saved

draft discarded
















































Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f422996%2fhow-to-specify-and-fit-a-hybrid-machine-learning-linear-model%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

Ласкавець круглолистий Зміст Опис | Поширення | Галерея | Примітки | Посилання | Навігаційне меню58171138361-22960890446Bupleurum rotundifoliumEuro+Med PlantbasePlants of the World Online — Kew ScienceGermplasm Resources Information Network (GRIN)Ласкавецькн. VI : Літери Ком — Левиправивши або дописавши її