How to specify and fit a hybrid machine learning - linear modelCombining several variables into one outcome score: How is it done in the machine learning community?Are visualization techniques useful when the predictive model is a highly flexible machine learning algorithm?Reconciling boosted regression trees (BRT), generalized boosted models (GBM), and gradient boosting machine (GBM)Heteroscedasticity in machine learning predictionsA scenario of developing machine learning modelGeneral form of a machine learning algorithmWould machine learning techniques help if the linear and nonlinear relationships is so weak?Different machine learning models give contradictory results
Are required indicators necessary for radio buttons?
!I!n!s!e!r!t! !n!b!e!t!w!e!e!n!
Why don't we use Cavea-B
Dark side of an exoplanet - if it was earth-like would its surface light be detectable?
A second course in the representation theory
What is the evidence on the danger of feeding whole blueberries and grapes to infants and toddlers?
Why my earth simulation is slower than the reality?
Why didn’t Doctor Strange stay in the original winning timeline?
Taking out number of subarrays from an array which contains all the distinct elements of that array
Something in the TV
Do we need to assume underlying returns are normal in BSM model, given Central Limit Theorem?
Starships without computers?
Is it appropriate for a prospective landlord to ask me for my credit report?
Don't understand MOSFET as amplifier
What are the pros and cons of Einstein-Cartan Theory?
How to "know" if I have a passion?
Vacuum collapse -- why do strong metals implode but glass doesn't?
In an emergency, how do I find and share my position?
Can pay be witheld for hours cleaning up after closing time?
How much code would a codegolf golf if a codegolf could golf code?
How to setup a teletype to a unix shell
Was Switzerland really impossible to invade during WW2?
Is there such a thing as too inconvenient?
Potential new partner angry about first collaboration - how to answer email to close up this encounter in a graceful manner
How to specify and fit a hybrid machine learning - linear model
Combining several variables into one outcome score: How is it done in the machine learning community?Are visualization techniques useful when the predictive model is a highly flexible machine learning algorithm?Reconciling boosted regression trees (BRT), generalized boosted models (GBM), and gradient boosting machine (GBM)Heteroscedasticity in machine learning predictionsA scenario of developing machine learning modelGeneral form of a machine learning algorithmWould machine learning techniques help if the linear and nonlinear relationships is so weak?Different machine learning models give contradictory results
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I want to understand how some dependent variable y
, depends on a known relationship with independent variable x
, but also how x
potentially interacts with a high dimensional complex set of features (microbiome data which can be represented as thousands of predictors per observation of y
). Therefore, I know that the general form of the model is:
y ~ m1*x + b
However I would like to add an interaction term, generating the model:
y ~ m1*x + m2*x*microbiome + b
Where microbiome
is the very large microbiome feature set.
I think the microbiome data essentially interacts with the independent variable x
to affect y
, but I don't know how. There are thousands of species, and my guess is that certain combinations will be predictive of this interaction and explain a lot of variation, but a priori I don't know which. I also suspect that the microbiome features relate to each other in a non-linear way. Essentially I want to use a machine learning approach to figure that out. I am aware of "boosted" regressions, where you use a machine learning algorithm on the residuals, however I want to specify something a bit more mechanistic than that.
If anyone could suggest a method to do something like this (especially if it can be implemented in R) I would be very interested. If there is a way to use the residuals of the model to do this, I would also be interested. It's worth noting that in the actual application I have many more predictors in the model, many of which have non-linear relationships with y
.
machine-learning boosting
$endgroup$
add a comment |
$begingroup$
I want to understand how some dependent variable y
, depends on a known relationship with independent variable x
, but also how x
potentially interacts with a high dimensional complex set of features (microbiome data which can be represented as thousands of predictors per observation of y
). Therefore, I know that the general form of the model is:
y ~ m1*x + b
However I would like to add an interaction term, generating the model:
y ~ m1*x + m2*x*microbiome + b
Where microbiome
is the very large microbiome feature set.
I think the microbiome data essentially interacts with the independent variable x
to affect y
, but I don't know how. There are thousands of species, and my guess is that certain combinations will be predictive of this interaction and explain a lot of variation, but a priori I don't know which. I also suspect that the microbiome features relate to each other in a non-linear way. Essentially I want to use a machine learning approach to figure that out. I am aware of "boosted" regressions, where you use a machine learning algorithm on the residuals, however I want to specify something a bit more mechanistic than that.
If anyone could suggest a method to do something like this (especially if it can be implemented in R) I would be very interested. If there is a way to use the residuals of the model to do this, I would also be interested. It's worth noting that in the actual application I have many more predictors in the model, many of which have non-linear relationships with y
.
machine-learning boosting
$endgroup$
$begingroup$
Why is it not sufficient to specify the interaction in the usual way?
$endgroup$
– Sycorax
8 hours ago
$begingroup$
@Sycorax the microbiome feature set has 1000s of columns, and I don't think any one of the individual columns should have a relationship withy
I think its way more likely that different combinations of the features in the microbiome dataset interact withx
to predict y, but I don't know which combinations.
$endgroup$
– colin
8 hours ago
add a comment |
$begingroup$
I want to understand how some dependent variable y
, depends on a known relationship with independent variable x
, but also how x
potentially interacts with a high dimensional complex set of features (microbiome data which can be represented as thousands of predictors per observation of y
). Therefore, I know that the general form of the model is:
y ~ m1*x + b
However I would like to add an interaction term, generating the model:
y ~ m1*x + m2*x*microbiome + b
Where microbiome
is the very large microbiome feature set.
I think the microbiome data essentially interacts with the independent variable x
to affect y
, but I don't know how. There are thousands of species, and my guess is that certain combinations will be predictive of this interaction and explain a lot of variation, but a priori I don't know which. I also suspect that the microbiome features relate to each other in a non-linear way. Essentially I want to use a machine learning approach to figure that out. I am aware of "boosted" regressions, where you use a machine learning algorithm on the residuals, however I want to specify something a bit more mechanistic than that.
If anyone could suggest a method to do something like this (especially if it can be implemented in R) I would be very interested. If there is a way to use the residuals of the model to do this, I would also be interested. It's worth noting that in the actual application I have many more predictors in the model, many of which have non-linear relationships with y
.
machine-learning boosting
$endgroup$
I want to understand how some dependent variable y
, depends on a known relationship with independent variable x
, but also how x
potentially interacts with a high dimensional complex set of features (microbiome data which can be represented as thousands of predictors per observation of y
). Therefore, I know that the general form of the model is:
y ~ m1*x + b
However I would like to add an interaction term, generating the model:
y ~ m1*x + m2*x*microbiome + b
Where microbiome
is the very large microbiome feature set.
I think the microbiome data essentially interacts with the independent variable x
to affect y
, but I don't know how. There are thousands of species, and my guess is that certain combinations will be predictive of this interaction and explain a lot of variation, but a priori I don't know which. I also suspect that the microbiome features relate to each other in a non-linear way. Essentially I want to use a machine learning approach to figure that out. I am aware of "boosted" regressions, where you use a machine learning algorithm on the residuals, however I want to specify something a bit more mechanistic than that.
If anyone could suggest a method to do something like this (especially if it can be implemented in R) I would be very interested. If there is a way to use the residuals of the model to do this, I would also be interested. It's worth noting that in the actual application I have many more predictors in the model, many of which have non-linear relationships with y
.
machine-learning boosting
machine-learning boosting
edited 8 hours ago
colin
asked 8 hours ago
colincolin
3875 silver badges20 bronze badges
3875 silver badges20 bronze badges
$begingroup$
Why is it not sufficient to specify the interaction in the usual way?
$endgroup$
– Sycorax
8 hours ago
$begingroup$
@Sycorax the microbiome feature set has 1000s of columns, and I don't think any one of the individual columns should have a relationship withy
I think its way more likely that different combinations of the features in the microbiome dataset interact withx
to predict y, but I don't know which combinations.
$endgroup$
– colin
8 hours ago
add a comment |
$begingroup$
Why is it not sufficient to specify the interaction in the usual way?
$endgroup$
– Sycorax
8 hours ago
$begingroup$
@Sycorax the microbiome feature set has 1000s of columns, and I don't think any one of the individual columns should have a relationship withy
I think its way more likely that different combinations of the features in the microbiome dataset interact withx
to predict y, but I don't know which combinations.
$endgroup$
– colin
8 hours ago
$begingroup$
Why is it not sufficient to specify the interaction in the usual way?
$endgroup$
– Sycorax
8 hours ago
$begingroup$
Why is it not sufficient to specify the interaction in the usual way?
$endgroup$
– Sycorax
8 hours ago
$begingroup$
@Sycorax the microbiome feature set has 1000s of columns, and I don't think any one of the individual columns should have a relationship with
y
I think its way more likely that different combinations of the features in the microbiome dataset interact with x
to predict y, but I don't know which combinations.$endgroup$
– colin
8 hours ago
$begingroup$
@Sycorax the microbiome feature set has 1000s of columns, and I don't think any one of the individual columns should have a relationship with
y
I think its way more likely that different combinations of the features in the microbiome dataset interact with x
to predict y, but I don't know which combinations.$endgroup$
– colin
8 hours ago
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
The easiest way to implement model like
y ~ m1*x + m2*x*microbiome + b
would be to replace microbiome
with a dense neural network
y ~ m1*x + m2*x*nn(microbiome) + b
so that the neural network nn
would reduce dimensionality (to single or multiple dimensions, depending of number of units in the output layer) and do the feature engineering for you. The nice part is that it would let you to keep the assumed form of the model, but the neural network would deal with the extra features for you.
This can be easily done in frameworks like Keras, that are designed to deal with large datasets and scale nicely. In Keras, this would translate to something like the model definition below. To understand the code, you would probably need to dive deeper into Keras, but hopefully many tutorials are available online.
from keras.models import Model
from keras.layers import Input, Dense, multiply, concatenate
x_inp = Input(shape=(1,))
microbiome_inp = Input(shape=(k,))
# 3-layer neural network
nn = Dense(200, activation='relu')(microbiome_inp)
nn = Dense(50, activation='relu')(nn)
nn = Dense(1)(nn)
# x*nn(microbiome)
mul = multiply([x_inp, nn])
# m1*x + m2*x*nn(microbiome) + b
conc = concatenate([x_inp, mul])
out = Dense(1)(conc)
model = Model(inputs=[x_inp, microbiome_inp], outputs=out)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit([x, microbiome], y)
$endgroup$
$begingroup$
This is exactly what I am talking about, thanks! Can you link to any place that has a tutorial on how to specify and fit a model like this in TensorFlow?
$endgroup$
– colin
8 hours ago
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f422996%2fhow-to-specify-and-fit-a-hybrid-machine-learning-linear-model%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The easiest way to implement model like
y ~ m1*x + m2*x*microbiome + b
would be to replace microbiome
with a dense neural network
y ~ m1*x + m2*x*nn(microbiome) + b
so that the neural network nn
would reduce dimensionality (to single or multiple dimensions, depending of number of units in the output layer) and do the feature engineering for you. The nice part is that it would let you to keep the assumed form of the model, but the neural network would deal with the extra features for you.
This can be easily done in frameworks like Keras, that are designed to deal with large datasets and scale nicely. In Keras, this would translate to something like the model definition below. To understand the code, you would probably need to dive deeper into Keras, but hopefully many tutorials are available online.
from keras.models import Model
from keras.layers import Input, Dense, multiply, concatenate
x_inp = Input(shape=(1,))
microbiome_inp = Input(shape=(k,))
# 3-layer neural network
nn = Dense(200, activation='relu')(microbiome_inp)
nn = Dense(50, activation='relu')(nn)
nn = Dense(1)(nn)
# x*nn(microbiome)
mul = multiply([x_inp, nn])
# m1*x + m2*x*nn(microbiome) + b
conc = concatenate([x_inp, mul])
out = Dense(1)(conc)
model = Model(inputs=[x_inp, microbiome_inp], outputs=out)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit([x, microbiome], y)
$endgroup$
$begingroup$
This is exactly what I am talking about, thanks! Can you link to any place that has a tutorial on how to specify and fit a model like this in TensorFlow?
$endgroup$
– colin
8 hours ago
add a comment |
$begingroup$
The easiest way to implement model like
y ~ m1*x + m2*x*microbiome + b
would be to replace microbiome
with a dense neural network
y ~ m1*x + m2*x*nn(microbiome) + b
so that the neural network nn
would reduce dimensionality (to single or multiple dimensions, depending of number of units in the output layer) and do the feature engineering for you. The nice part is that it would let you to keep the assumed form of the model, but the neural network would deal with the extra features for you.
This can be easily done in frameworks like Keras, that are designed to deal with large datasets and scale nicely. In Keras, this would translate to something like the model definition below. To understand the code, you would probably need to dive deeper into Keras, but hopefully many tutorials are available online.
from keras.models import Model
from keras.layers import Input, Dense, multiply, concatenate
x_inp = Input(shape=(1,))
microbiome_inp = Input(shape=(k,))
# 3-layer neural network
nn = Dense(200, activation='relu')(microbiome_inp)
nn = Dense(50, activation='relu')(nn)
nn = Dense(1)(nn)
# x*nn(microbiome)
mul = multiply([x_inp, nn])
# m1*x + m2*x*nn(microbiome) + b
conc = concatenate([x_inp, mul])
out = Dense(1)(conc)
model = Model(inputs=[x_inp, microbiome_inp], outputs=out)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit([x, microbiome], y)
$endgroup$
$begingroup$
This is exactly what I am talking about, thanks! Can you link to any place that has a tutorial on how to specify and fit a model like this in TensorFlow?
$endgroup$
– colin
8 hours ago
add a comment |
$begingroup$
The easiest way to implement model like
y ~ m1*x + m2*x*microbiome + b
would be to replace microbiome
with a dense neural network
y ~ m1*x + m2*x*nn(microbiome) + b
so that the neural network nn
would reduce dimensionality (to single or multiple dimensions, depending of number of units in the output layer) and do the feature engineering for you. The nice part is that it would let you to keep the assumed form of the model, but the neural network would deal with the extra features for you.
This can be easily done in frameworks like Keras, that are designed to deal with large datasets and scale nicely. In Keras, this would translate to something like the model definition below. To understand the code, you would probably need to dive deeper into Keras, but hopefully many tutorials are available online.
from keras.models import Model
from keras.layers import Input, Dense, multiply, concatenate
x_inp = Input(shape=(1,))
microbiome_inp = Input(shape=(k,))
# 3-layer neural network
nn = Dense(200, activation='relu')(microbiome_inp)
nn = Dense(50, activation='relu')(nn)
nn = Dense(1)(nn)
# x*nn(microbiome)
mul = multiply([x_inp, nn])
# m1*x + m2*x*nn(microbiome) + b
conc = concatenate([x_inp, mul])
out = Dense(1)(conc)
model = Model(inputs=[x_inp, microbiome_inp], outputs=out)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit([x, microbiome], y)
$endgroup$
The easiest way to implement model like
y ~ m1*x + m2*x*microbiome + b
would be to replace microbiome
with a dense neural network
y ~ m1*x + m2*x*nn(microbiome) + b
so that the neural network nn
would reduce dimensionality (to single or multiple dimensions, depending of number of units in the output layer) and do the feature engineering for you. The nice part is that it would let you to keep the assumed form of the model, but the neural network would deal with the extra features for you.
This can be easily done in frameworks like Keras, that are designed to deal with large datasets and scale nicely. In Keras, this would translate to something like the model definition below. To understand the code, you would probably need to dive deeper into Keras, but hopefully many tutorials are available online.
from keras.models import Model
from keras.layers import Input, Dense, multiply, concatenate
x_inp = Input(shape=(1,))
microbiome_inp = Input(shape=(k,))
# 3-layer neural network
nn = Dense(200, activation='relu')(microbiome_inp)
nn = Dense(50, activation='relu')(nn)
nn = Dense(1)(nn)
# x*nn(microbiome)
mul = multiply([x_inp, nn])
# m1*x + m2*x*nn(microbiome) + b
conc = concatenate([x_inp, mul])
out = Dense(1)(conc)
model = Model(inputs=[x_inp, microbiome_inp], outputs=out)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit([x, microbiome], y)
edited 4 hours ago
answered 8 hours ago
Tim♦Tim
63.9k10 gold badges142 silver badges241 bronze badges
63.9k10 gold badges142 silver badges241 bronze badges
$begingroup$
This is exactly what I am talking about, thanks! Can you link to any place that has a tutorial on how to specify and fit a model like this in TensorFlow?
$endgroup$
– colin
8 hours ago
add a comment |
$begingroup$
This is exactly what I am talking about, thanks! Can you link to any place that has a tutorial on how to specify and fit a model like this in TensorFlow?
$endgroup$
– colin
8 hours ago
$begingroup$
This is exactly what I am talking about, thanks! Can you link to any place that has a tutorial on how to specify and fit a model like this in TensorFlow?
$endgroup$
– colin
8 hours ago
$begingroup$
This is exactly what I am talking about, thanks! Can you link to any place that has a tutorial on how to specify and fit a model like this in TensorFlow?
$endgroup$
– colin
8 hours ago
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f422996%2fhow-to-specify-and-fit-a-hybrid-machine-learning-linear-model%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Why is it not sufficient to specify the interaction in the usual way?
$endgroup$
– Sycorax
8 hours ago
$begingroup$
@Sycorax the microbiome feature set has 1000s of columns, and I don't think any one of the individual columns should have a relationship with
y
I think its way more likely that different combinations of the features in the microbiome dataset interact withx
to predict y, but I don't know which combinations.$endgroup$
– colin
8 hours ago