Machine Learning Golf: MultiplicationBuild an adding machine using NAND logic gatesBuild a multiplying machine using NAND logic gatesBuild a minifloat adding machine using NAND logic gatesHow to golf string output in ChefMeta-atomic code golfCollatz sequence on a two counter machineZipper multiplicationRegular expression parserDirichlet ConvolutionCan a neural network recognize primes?
Milky way is orbiting around?
Does the Defensive Duelist feat stack with the Warforged integrated protection?
Does Evolution Sage proliferate Blast Zone when played?
Should I cheat if the majority does it?
What is the difference between a historical drama and a period drama?
How to travel between two stationary worlds in the least amount of time? (time dilation)
Do intermediate subdomains need to exist?
Was Wolfgang Unzicker the last Amateur GM?
Explain how 'Sharing the burden' puzzle from Professor Layton and the Miracle Mask should be solved
How do both sides know the MTU
Olive oil in Japanese cooking
Contributing to a candidate as a Foreign National US Resident?
Why would a propellor have blades of different lengths?
Why do Klingons use cloaking devices?
How can I effectively map a multi-level dungeon?
What can a novel do that film and TV cannot?
What is the addition in the re-released version of Avengers: Endgame?
Can 4 Joy cons connect to the same Switch?
Data normalization before or after train-test split?
How to deal with administrative duties killing the research spirit?
Why do we need a bootloader separate than our application program in MCU's?
Has chattel slavery ever been used as a criminal punishment in the USA since the passage of the Thirteenth Amendment?
Magento 2 Professional Developer certification study guidelines
Do I need to be legally qualified to install a Hive smart thermostat?
Machine Learning Golf: Multiplication
Build an adding machine using NAND logic gatesBuild a multiplying machine using NAND logic gatesBuild a minifloat adding machine using NAND logic gatesHow to golf string output in ChefMeta-atomic code golfCollatz sequence on a two counter machineZipper multiplicationRegular expression parserDirichlet ConvolutionCan a neural network recognize primes?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I'd like to propose a different kind of golfing challenge to this community:
In the language and framework of your choice, design and train a neural network that, given $(x_1, x_2)$ calculates their product $x_1 cdot x_2$ for all integers $x_1, x_2$ between (and including) $-10$ and $10$.
Performance Goal
To qualify, your model may not deviate by more than $0.5$ from the correct result on any of those entries.
Rules
Your model
- must be a 'traditional' neural network (a node's value is calculated as a weighted linear combination of some of the nodes in a previous layer followed by an activation function),
- may only use the following standard activation functions:
$mathrmlinear(x) = x$,
$mathrmsoftmax(vecx)_i = frace^x_isum_j e^x_j$,
$mathrmselu_alpha, beta(x) = begincases
beta cdot x & text, if x > 0 \
alpha cdot beta (e^x -1 ) & text, otherwise
endcases$,
$mathrmsoftplus(x) = ln(e^x+1)$,
$mathrmleaky-relu_alpha(x) = begincases
x & text, if x < 0 \
alpha cdot x & text, otherwise
endcases$,
$tanh(x)$,
$mathrmsigmoid(x) = frace^xe^x+1$,
$mathrmhard-sigmoid(x) = begincases
0 & text, if x < -2.5 \
1 & text, if x > -2.5 \
0.2 cdot x + 0.5 & text, otherwise
endcases$,- $e^x$
- must take $(x_1, x_2)$ either as a tupel/vector/list/... of integers or floats as its only input,
- return the answer as an integer, float (or a suitable container, e.g. a vector or list, that contains this answer).
Your answer must include (or link to) all code necessary to check your results -- including the trained weights of your model.
Scoring
The neural network with the smallest number of weights (including bias weights) wins.
Enjoy!
arithmetic atomic-code-golf neural-networks
New contributor
Stefan Mesken is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
|
show 8 more comments
$begingroup$
I'd like to propose a different kind of golfing challenge to this community:
In the language and framework of your choice, design and train a neural network that, given $(x_1, x_2)$ calculates their product $x_1 cdot x_2$ for all integers $x_1, x_2$ between (and including) $-10$ and $10$.
Performance Goal
To qualify, your model may not deviate by more than $0.5$ from the correct result on any of those entries.
Rules
Your model
- must be a 'traditional' neural network (a node's value is calculated as a weighted linear combination of some of the nodes in a previous layer followed by an activation function),
- may only use the following standard activation functions:
$mathrmlinear(x) = x$,
$mathrmsoftmax(vecx)_i = frace^x_isum_j e^x_j$,
$mathrmselu_alpha, beta(x) = begincases
beta cdot x & text, if x > 0 \
alpha cdot beta (e^x -1 ) & text, otherwise
endcases$,
$mathrmsoftplus(x) = ln(e^x+1)$,
$mathrmleaky-relu_alpha(x) = begincases
x & text, if x < 0 \
alpha cdot x & text, otherwise
endcases$,
$tanh(x)$,
$mathrmsigmoid(x) = frace^xe^x+1$,
$mathrmhard-sigmoid(x) = begincases
0 & text, if x < -2.5 \
1 & text, if x > -2.5 \
0.2 cdot x + 0.5 & text, otherwise
endcases$,- $e^x$
- must take $(x_1, x_2)$ either as a tupel/vector/list/... of integers or floats as its only input,
- return the answer as an integer, float (or a suitable container, e.g. a vector or list, that contains this answer).
Your answer must include (or link to) all code necessary to check your results -- including the trained weights of your model.
Scoring
The neural network with the smallest number of weights (including bias weights) wins.
Enjoy!
arithmetic atomic-code-golf neural-networks
New contributor
Stefan Mesken is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
3
$begingroup$
Welcome to the site! I think this challenge could benefit a good deal from a more robust definition of a neural network. There are a couple of things here 1) It would be very nice for you to state it in language that does not already imply knowledge of NNs 2) You really should list the activation functions in your post rather than link out to an external source (outside links can change or disappear).
$endgroup$
– Sriotchilism O'Zaic
9 hours ago
1
$begingroup$
Can we have constant offsets? e.g., for a node with a single inputx, compute its value as ax+b? And if yes, does that count as 1 or 2 weights?
$endgroup$
– Grimy
7 hours ago
3
$begingroup$
Can we reuse weights/use convolutional layers? (I recommend removing the bonus, as it doesn't add anything to the challenge and just distracts from the main goal.) Are the weights supposed to be real or can they be complex?
$endgroup$
– flawr
7 hours ago
2
$begingroup$
Your wording implies nodes from layer 3 can't use inputs from layer 1. Does it cost a weight to have a layer 2 node simply doingf(x) = xto forward its input?
$endgroup$
– Grimy
7 hours ago
1
$begingroup$
@Grimy You're correct. My wording doesn't correctly reflect my intentions. I'll edit that. Skipping layers is allowed and if you do so with one of the listed activation functions, it doesn't cost you any weights.
$endgroup$
– Stefan Mesken
7 hours ago
|
show 8 more comments
$begingroup$
I'd like to propose a different kind of golfing challenge to this community:
In the language and framework of your choice, design and train a neural network that, given $(x_1, x_2)$ calculates their product $x_1 cdot x_2$ for all integers $x_1, x_2$ between (and including) $-10$ and $10$.
Performance Goal
To qualify, your model may not deviate by more than $0.5$ from the correct result on any of those entries.
Rules
Your model
- must be a 'traditional' neural network (a node's value is calculated as a weighted linear combination of some of the nodes in a previous layer followed by an activation function),
- may only use the following standard activation functions:
$mathrmlinear(x) = x$,
$mathrmsoftmax(vecx)_i = frace^x_isum_j e^x_j$,
$mathrmselu_alpha, beta(x) = begincases
beta cdot x & text, if x > 0 \
alpha cdot beta (e^x -1 ) & text, otherwise
endcases$,
$mathrmsoftplus(x) = ln(e^x+1)$,
$mathrmleaky-relu_alpha(x) = begincases
x & text, if x < 0 \
alpha cdot x & text, otherwise
endcases$,
$tanh(x)$,
$mathrmsigmoid(x) = frace^xe^x+1$,
$mathrmhard-sigmoid(x) = begincases
0 & text, if x < -2.5 \
1 & text, if x > -2.5 \
0.2 cdot x + 0.5 & text, otherwise
endcases$,- $e^x$
- must take $(x_1, x_2)$ either as a tupel/vector/list/... of integers or floats as its only input,
- return the answer as an integer, float (or a suitable container, e.g. a vector or list, that contains this answer).
Your answer must include (or link to) all code necessary to check your results -- including the trained weights of your model.
Scoring
The neural network with the smallest number of weights (including bias weights) wins.
Enjoy!
arithmetic atomic-code-golf neural-networks
New contributor
Stefan Mesken is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
I'd like to propose a different kind of golfing challenge to this community:
In the language and framework of your choice, design and train a neural network that, given $(x_1, x_2)$ calculates their product $x_1 cdot x_2$ for all integers $x_1, x_2$ between (and including) $-10$ and $10$.
Performance Goal
To qualify, your model may not deviate by more than $0.5$ from the correct result on any of those entries.
Rules
Your model
- must be a 'traditional' neural network (a node's value is calculated as a weighted linear combination of some of the nodes in a previous layer followed by an activation function),
- may only use the following standard activation functions:
$mathrmlinear(x) = x$,
$mathrmsoftmax(vecx)_i = frace^x_isum_j e^x_j$,
$mathrmselu_alpha, beta(x) = begincases
beta cdot x & text, if x > 0 \
alpha cdot beta (e^x -1 ) & text, otherwise
endcases$,
$mathrmsoftplus(x) = ln(e^x+1)$,
$mathrmleaky-relu_alpha(x) = begincases
x & text, if x < 0 \
alpha cdot x & text, otherwise
endcases$,
$tanh(x)$,
$mathrmsigmoid(x) = frace^xe^x+1$,
$mathrmhard-sigmoid(x) = begincases
0 & text, if x < -2.5 \
1 & text, if x > -2.5 \
0.2 cdot x + 0.5 & text, otherwise
endcases$,- $e^x$
- must take $(x_1, x_2)$ either as a tupel/vector/list/... of integers or floats as its only input,
- return the answer as an integer, float (or a suitable container, e.g. a vector or list, that contains this answer).
Your answer must include (or link to) all code necessary to check your results -- including the trained weights of your model.
Scoring
The neural network with the smallest number of weights (including bias weights) wins.
Enjoy!
arithmetic atomic-code-golf neural-networks
arithmetic atomic-code-golf neural-networks
New contributor
Stefan Mesken is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Stefan Mesken is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 4 hours ago
Expired Data
1,5454 silver badges21 bronze badges
1,5454 silver badges21 bronze badges
New contributor
Stefan Mesken is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 10 hours ago
Stefan MeskenStefan Mesken
1616 bronze badges
1616 bronze badges
New contributor
Stefan Mesken is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Stefan Mesken is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
3
$begingroup$
Welcome to the site! I think this challenge could benefit a good deal from a more robust definition of a neural network. There are a couple of things here 1) It would be very nice for you to state it in language that does not already imply knowledge of NNs 2) You really should list the activation functions in your post rather than link out to an external source (outside links can change or disappear).
$endgroup$
– Sriotchilism O'Zaic
9 hours ago
1
$begingroup$
Can we have constant offsets? e.g., for a node with a single inputx, compute its value as ax+b? And if yes, does that count as 1 or 2 weights?
$endgroup$
– Grimy
7 hours ago
3
$begingroup$
Can we reuse weights/use convolutional layers? (I recommend removing the bonus, as it doesn't add anything to the challenge and just distracts from the main goal.) Are the weights supposed to be real or can they be complex?
$endgroup$
– flawr
7 hours ago
2
$begingroup$
Your wording implies nodes from layer 3 can't use inputs from layer 1. Does it cost a weight to have a layer 2 node simply doingf(x) = xto forward its input?
$endgroup$
– Grimy
7 hours ago
1
$begingroup$
@Grimy You're correct. My wording doesn't correctly reflect my intentions. I'll edit that. Skipping layers is allowed and if you do so with one of the listed activation functions, it doesn't cost you any weights.
$endgroup$
– Stefan Mesken
7 hours ago
|
show 8 more comments
3
$begingroup$
Welcome to the site! I think this challenge could benefit a good deal from a more robust definition of a neural network. There are a couple of things here 1) It would be very nice for you to state it in language that does not already imply knowledge of NNs 2) You really should list the activation functions in your post rather than link out to an external source (outside links can change or disappear).
$endgroup$
– Sriotchilism O'Zaic
9 hours ago
1
$begingroup$
Can we have constant offsets? e.g., for a node with a single inputx, compute its value as ax+b? And if yes, does that count as 1 or 2 weights?
$endgroup$
– Grimy
7 hours ago
3
$begingroup$
Can we reuse weights/use convolutional layers? (I recommend removing the bonus, as it doesn't add anything to the challenge and just distracts from the main goal.) Are the weights supposed to be real or can they be complex?
$endgroup$
– flawr
7 hours ago
2
$begingroup$
Your wording implies nodes from layer 3 can't use inputs from layer 1. Does it cost a weight to have a layer 2 node simply doingf(x) = xto forward its input?
$endgroup$
– Grimy
7 hours ago
1
$begingroup$
@Grimy You're correct. My wording doesn't correctly reflect my intentions. I'll edit that. Skipping layers is allowed and if you do so with one of the listed activation functions, it doesn't cost you any weights.
$endgroup$
– Stefan Mesken
7 hours ago
3
3
$begingroup$
Welcome to the site! I think this challenge could benefit a good deal from a more robust definition of a neural network. There are a couple of things here 1) It would be very nice for you to state it in language that does not already imply knowledge of NNs 2) You really should list the activation functions in your post rather than link out to an external source (outside links can change or disappear).
$endgroup$
– Sriotchilism O'Zaic
9 hours ago
$begingroup$
Welcome to the site! I think this challenge could benefit a good deal from a more robust definition of a neural network. There are a couple of things here 1) It would be very nice for you to state it in language that does not already imply knowledge of NNs 2) You really should list the activation functions in your post rather than link out to an external source (outside links can change or disappear).
$endgroup$
– Sriotchilism O'Zaic
9 hours ago
1
1
$begingroup$
Can we have constant offsets? e.g., for a node with a single input
x, compute its value as ax+b? And if yes, does that count as 1 or 2 weights?$endgroup$
– Grimy
7 hours ago
$begingroup$
Can we have constant offsets? e.g., for a node with a single input
x, compute its value as ax+b? And if yes, does that count as 1 or 2 weights?$endgroup$
– Grimy
7 hours ago
3
3
$begingroup$
Can we reuse weights/use convolutional layers? (I recommend removing the bonus, as it doesn't add anything to the challenge and just distracts from the main goal.) Are the weights supposed to be real or can they be complex?
$endgroup$
– flawr
7 hours ago
$begingroup$
Can we reuse weights/use convolutional layers? (I recommend removing the bonus, as it doesn't add anything to the challenge and just distracts from the main goal.) Are the weights supposed to be real or can they be complex?
$endgroup$
– flawr
7 hours ago
2
2
$begingroup$
Your wording implies nodes from layer 3 can't use inputs from layer 1. Does it cost a weight to have a layer 2 node simply doing
f(x) = x to forward its input?$endgroup$
– Grimy
7 hours ago
$begingroup$
Your wording implies nodes from layer 3 can't use inputs from layer 1. Does it cost a weight to have a layer 2 node simply doing
f(x) = x to forward its input?$endgroup$
– Grimy
7 hours ago
1
1
$begingroup$
@Grimy You're correct. My wording doesn't correctly reflect my intentions. I'll edit that. Skipping layers is allowed and if you do so with one of the listed activation functions, it doesn't cost you any weights.
$endgroup$
– Stefan Mesken
7 hours ago
$begingroup$
@Grimy You're correct. My wording doesn't correctly reflect my intentions. I'll edit that. Skipping layers is allowed and if you do so with one of the listed activation functions, it doesn't cost you any weights.
$endgroup$
– Stefan Mesken
7 hours ago
|
show 8 more comments
2 Answers
2
active
oldest
votes
$begingroup$
33 31 weights
# Activation functions
sub hard $_[0] < -2.5 ? 0 : $_[0] > 2.5 ? 1 : 0.2 * $_[0] + 0.5
sub linear $_[0]
# Layer 0
sub inputA() $a
sub inputB() $b
# Layer 1
sub a15() hard(5*inputA)
# Layer 2
sub a8() hard(-5*inputA + 75*a15 - 37.5)
# Layer 3
sub aa() linear(-5*inputA + 75*a15 - 40*a8)
# Layer 4
sub a4() hard(aa - 17.5)
# Layer 5
sub a2() hard(aa - 20*a4 - 7.5)
# Layer 6
sub a1() linear(0.2*aa - 4*a4 - 2*a2)
# Layer 7
sub b15() hard(0.25*inputB - 5*a15)
sub b8() hard(0.25*inputB - 5*a8)
sub b4() hard(0.25*inputB - 5*a4)
sub b2() hard(0.25*inputB - 5*a2)
sub b1() hard(0.25*inputB - 5*a1)
# Layer 8
sub output() linear(-300*b15 + 160*b8 + 80*b4 + 40*b2 + 20*b1 - 10*inputA)
# Test
for $a (-10..10)
for $b (-10..10)
die if abs($a * $b - output) >= 0.5;
print "All OK";
Try it online!
This does long multiplication in (sorta) binary, and thus returns the exact result. It should be possible to take advantage of the 0.5 error window to golf this some more, but I'm not sure how.
Layers 1 to 6 decompose the first input in 5 "bits". For golfing reasons, we don't use actual binary. The most significant "bit" has weight -15 instead of 16, and when the input is 0, all the "bits" are 0.5 (which still works out fine, since it preserves the identity inputA = -15*a15 + 8*a8 + 4*a4 + 2*a2 + 1*a1).
$endgroup$
1
$begingroup$
I did expect someone to come up with a hard-coded, ANN-ified multiplication algorithm. But I didn't think it would be the first response. Well done! (I'm also eager to see whether you'll be able to pull something like this off with the MNIST dataset or some other, more relastic ML problem :D.)
$endgroup$
– Stefan Mesken
6 hours ago
add a comment |
$begingroup$
21 13 weights
This is based on the polarization identity of bilinear forms which in the one dimensional real case reduces to the polynomial identity:
$$ xcdot y = frac(x+y)^2 - (x-y)^24$$
So y1 just computes [x+y, x-y] using a linear transformation, and y3 is just the absolute value of y1 as a preprocessing step for the next one: Then the "hard" part is computing the squares which I will explain below, and after that just computing a difference and scaling which is again a linear operation.
To compute the squares I use an exponential series $s$ which should be accurate for all integers $0,1,2,ldots,20$ within around $0.5$. This series is of the form
$$ textapprox_square(x) = sum_i=0^2 w_i exp(0.0001 cdot i cdot x)$$
where I just optimized for the weights W2 ($=(w_i)_i$). This whole approximation comprises again just two linear transformations with an exponential activation sandwiched in between. This approach results in a maximal deviation of about 0.02.
function p = net(x)
relu = @(x)max(0,x);
% Linear: 4 weights
y1 = [1,1;1,-1]*x;
% Linear + ReLu + Sum: 1 weight (the -1):
y2 = -1 * y1;
y3 = relu(y1) + relu(y2);
% Linear + exp: 3 weights:
W1 = (0:2)*0.0001;
y4 = exp(y3 * W1);
% Linear: 3 weights:
W2 = [99700114.4299316;-199400468.100687;99700353.6313757];
y5 = y4 * W2;
% Linear : 2 weights:
p = [0.25,-0.25]*y5;
end
Try it online!
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "200"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Stefan Mesken is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodegolf.stackexchange.com%2fquestions%2f187562%2fmachine-learning-golf-multiplication%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
33 31 weights
# Activation functions
sub hard $_[0] < -2.5 ? 0 : $_[0] > 2.5 ? 1 : 0.2 * $_[0] + 0.5
sub linear $_[0]
# Layer 0
sub inputA() $a
sub inputB() $b
# Layer 1
sub a15() hard(5*inputA)
# Layer 2
sub a8() hard(-5*inputA + 75*a15 - 37.5)
# Layer 3
sub aa() linear(-5*inputA + 75*a15 - 40*a8)
# Layer 4
sub a4() hard(aa - 17.5)
# Layer 5
sub a2() hard(aa - 20*a4 - 7.5)
# Layer 6
sub a1() linear(0.2*aa - 4*a4 - 2*a2)
# Layer 7
sub b15() hard(0.25*inputB - 5*a15)
sub b8() hard(0.25*inputB - 5*a8)
sub b4() hard(0.25*inputB - 5*a4)
sub b2() hard(0.25*inputB - 5*a2)
sub b1() hard(0.25*inputB - 5*a1)
# Layer 8
sub output() linear(-300*b15 + 160*b8 + 80*b4 + 40*b2 + 20*b1 - 10*inputA)
# Test
for $a (-10..10)
for $b (-10..10)
die if abs($a * $b - output) >= 0.5;
print "All OK";
Try it online!
This does long multiplication in (sorta) binary, and thus returns the exact result. It should be possible to take advantage of the 0.5 error window to golf this some more, but I'm not sure how.
Layers 1 to 6 decompose the first input in 5 "bits". For golfing reasons, we don't use actual binary. The most significant "bit" has weight -15 instead of 16, and when the input is 0, all the "bits" are 0.5 (which still works out fine, since it preserves the identity inputA = -15*a15 + 8*a8 + 4*a4 + 2*a2 + 1*a1).
$endgroup$
1
$begingroup$
I did expect someone to come up with a hard-coded, ANN-ified multiplication algorithm. But I didn't think it would be the first response. Well done! (I'm also eager to see whether you'll be able to pull something like this off with the MNIST dataset or some other, more relastic ML problem :D.)
$endgroup$
– Stefan Mesken
6 hours ago
add a comment |
$begingroup$
33 31 weights
# Activation functions
sub hard $_[0] < -2.5 ? 0 : $_[0] > 2.5 ? 1 : 0.2 * $_[0] + 0.5
sub linear $_[0]
# Layer 0
sub inputA() $a
sub inputB() $b
# Layer 1
sub a15() hard(5*inputA)
# Layer 2
sub a8() hard(-5*inputA + 75*a15 - 37.5)
# Layer 3
sub aa() linear(-5*inputA + 75*a15 - 40*a8)
# Layer 4
sub a4() hard(aa - 17.5)
# Layer 5
sub a2() hard(aa - 20*a4 - 7.5)
# Layer 6
sub a1() linear(0.2*aa - 4*a4 - 2*a2)
# Layer 7
sub b15() hard(0.25*inputB - 5*a15)
sub b8() hard(0.25*inputB - 5*a8)
sub b4() hard(0.25*inputB - 5*a4)
sub b2() hard(0.25*inputB - 5*a2)
sub b1() hard(0.25*inputB - 5*a1)
# Layer 8
sub output() linear(-300*b15 + 160*b8 + 80*b4 + 40*b2 + 20*b1 - 10*inputA)
# Test
for $a (-10..10)
for $b (-10..10)
die if abs($a * $b - output) >= 0.5;
print "All OK";
Try it online!
This does long multiplication in (sorta) binary, and thus returns the exact result. It should be possible to take advantage of the 0.5 error window to golf this some more, but I'm not sure how.
Layers 1 to 6 decompose the first input in 5 "bits". For golfing reasons, we don't use actual binary. The most significant "bit" has weight -15 instead of 16, and when the input is 0, all the "bits" are 0.5 (which still works out fine, since it preserves the identity inputA = -15*a15 + 8*a8 + 4*a4 + 2*a2 + 1*a1).
$endgroup$
1
$begingroup$
I did expect someone to come up with a hard-coded, ANN-ified multiplication algorithm. But I didn't think it would be the first response. Well done! (I'm also eager to see whether you'll be able to pull something like this off with the MNIST dataset or some other, more relastic ML problem :D.)
$endgroup$
– Stefan Mesken
6 hours ago
add a comment |
$begingroup$
33 31 weights
# Activation functions
sub hard $_[0] < -2.5 ? 0 : $_[0] > 2.5 ? 1 : 0.2 * $_[0] + 0.5
sub linear $_[0]
# Layer 0
sub inputA() $a
sub inputB() $b
# Layer 1
sub a15() hard(5*inputA)
# Layer 2
sub a8() hard(-5*inputA + 75*a15 - 37.5)
# Layer 3
sub aa() linear(-5*inputA + 75*a15 - 40*a8)
# Layer 4
sub a4() hard(aa - 17.5)
# Layer 5
sub a2() hard(aa - 20*a4 - 7.5)
# Layer 6
sub a1() linear(0.2*aa - 4*a4 - 2*a2)
# Layer 7
sub b15() hard(0.25*inputB - 5*a15)
sub b8() hard(0.25*inputB - 5*a8)
sub b4() hard(0.25*inputB - 5*a4)
sub b2() hard(0.25*inputB - 5*a2)
sub b1() hard(0.25*inputB - 5*a1)
# Layer 8
sub output() linear(-300*b15 + 160*b8 + 80*b4 + 40*b2 + 20*b1 - 10*inputA)
# Test
for $a (-10..10)
for $b (-10..10)
die if abs($a * $b - output) >= 0.5;
print "All OK";
Try it online!
This does long multiplication in (sorta) binary, and thus returns the exact result. It should be possible to take advantage of the 0.5 error window to golf this some more, but I'm not sure how.
Layers 1 to 6 decompose the first input in 5 "bits". For golfing reasons, we don't use actual binary. The most significant "bit" has weight -15 instead of 16, and when the input is 0, all the "bits" are 0.5 (which still works out fine, since it preserves the identity inputA = -15*a15 + 8*a8 + 4*a4 + 2*a2 + 1*a1).
$endgroup$
33 31 weights
# Activation functions
sub hard $_[0] < -2.5 ? 0 : $_[0] > 2.5 ? 1 : 0.2 * $_[0] + 0.5
sub linear $_[0]
# Layer 0
sub inputA() $a
sub inputB() $b
# Layer 1
sub a15() hard(5*inputA)
# Layer 2
sub a8() hard(-5*inputA + 75*a15 - 37.5)
# Layer 3
sub aa() linear(-5*inputA + 75*a15 - 40*a8)
# Layer 4
sub a4() hard(aa - 17.5)
# Layer 5
sub a2() hard(aa - 20*a4 - 7.5)
# Layer 6
sub a1() linear(0.2*aa - 4*a4 - 2*a2)
# Layer 7
sub b15() hard(0.25*inputB - 5*a15)
sub b8() hard(0.25*inputB - 5*a8)
sub b4() hard(0.25*inputB - 5*a4)
sub b2() hard(0.25*inputB - 5*a2)
sub b1() hard(0.25*inputB - 5*a1)
# Layer 8
sub output() linear(-300*b15 + 160*b8 + 80*b4 + 40*b2 + 20*b1 - 10*inputA)
# Test
for $a (-10..10)
for $b (-10..10)
die if abs($a * $b - output) >= 0.5;
print "All OK";
Try it online!
This does long multiplication in (sorta) binary, and thus returns the exact result. It should be possible to take advantage of the 0.5 error window to golf this some more, but I'm not sure how.
Layers 1 to 6 decompose the first input in 5 "bits". For golfing reasons, we don't use actual binary. The most significant "bit" has weight -15 instead of 16, and when the input is 0, all the "bits" are 0.5 (which still works out fine, since it preserves the identity inputA = -15*a15 + 8*a8 + 4*a4 + 2*a2 + 1*a1).
edited 5 hours ago
answered 6 hours ago
GrimyGrimy
4,71112 silver badges26 bronze badges
4,71112 silver badges26 bronze badges
1
$begingroup$
I did expect someone to come up with a hard-coded, ANN-ified multiplication algorithm. But I didn't think it would be the first response. Well done! (I'm also eager to see whether you'll be able to pull something like this off with the MNIST dataset or some other, more relastic ML problem :D.)
$endgroup$
– Stefan Mesken
6 hours ago
add a comment |
1
$begingroup$
I did expect someone to come up with a hard-coded, ANN-ified multiplication algorithm. But I didn't think it would be the first response. Well done! (I'm also eager to see whether you'll be able to pull something like this off with the MNIST dataset or some other, more relastic ML problem :D.)
$endgroup$
– Stefan Mesken
6 hours ago
1
1
$begingroup$
I did expect someone to come up with a hard-coded, ANN-ified multiplication algorithm. But I didn't think it would be the first response. Well done! (I'm also eager to see whether you'll be able to pull something like this off with the MNIST dataset or some other, more relastic ML problem :D.)
$endgroup$
– Stefan Mesken
6 hours ago
$begingroup$
I did expect someone to come up with a hard-coded, ANN-ified multiplication algorithm. But I didn't think it would be the first response. Well done! (I'm also eager to see whether you'll be able to pull something like this off with the MNIST dataset or some other, more relastic ML problem :D.)
$endgroup$
– Stefan Mesken
6 hours ago
add a comment |
$begingroup$
21 13 weights
This is based on the polarization identity of bilinear forms which in the one dimensional real case reduces to the polynomial identity:
$$ xcdot y = frac(x+y)^2 - (x-y)^24$$
So y1 just computes [x+y, x-y] using a linear transformation, and y3 is just the absolute value of y1 as a preprocessing step for the next one: Then the "hard" part is computing the squares which I will explain below, and after that just computing a difference and scaling which is again a linear operation.
To compute the squares I use an exponential series $s$ which should be accurate for all integers $0,1,2,ldots,20$ within around $0.5$. This series is of the form
$$ textapprox_square(x) = sum_i=0^2 w_i exp(0.0001 cdot i cdot x)$$
where I just optimized for the weights W2 ($=(w_i)_i$). This whole approximation comprises again just two linear transformations with an exponential activation sandwiched in between. This approach results in a maximal deviation of about 0.02.
function p = net(x)
relu = @(x)max(0,x);
% Linear: 4 weights
y1 = [1,1;1,-1]*x;
% Linear + ReLu + Sum: 1 weight (the -1):
y2 = -1 * y1;
y3 = relu(y1) + relu(y2);
% Linear + exp: 3 weights:
W1 = (0:2)*0.0001;
y4 = exp(y3 * W1);
% Linear: 3 weights:
W2 = [99700114.4299316;-199400468.100687;99700353.6313757];
y5 = y4 * W2;
% Linear : 2 weights:
p = [0.25,-0.25]*y5;
end
Try it online!
$endgroup$
add a comment |
$begingroup$
21 13 weights
This is based on the polarization identity of bilinear forms which in the one dimensional real case reduces to the polynomial identity:
$$ xcdot y = frac(x+y)^2 - (x-y)^24$$
So y1 just computes [x+y, x-y] using a linear transformation, and y3 is just the absolute value of y1 as a preprocessing step for the next one: Then the "hard" part is computing the squares which I will explain below, and after that just computing a difference and scaling which is again a linear operation.
To compute the squares I use an exponential series $s$ which should be accurate for all integers $0,1,2,ldots,20$ within around $0.5$. This series is of the form
$$ textapprox_square(x) = sum_i=0^2 w_i exp(0.0001 cdot i cdot x)$$
where I just optimized for the weights W2 ($=(w_i)_i$). This whole approximation comprises again just two linear transformations with an exponential activation sandwiched in between. This approach results in a maximal deviation of about 0.02.
function p = net(x)
relu = @(x)max(0,x);
% Linear: 4 weights
y1 = [1,1;1,-1]*x;
% Linear + ReLu + Sum: 1 weight (the -1):
y2 = -1 * y1;
y3 = relu(y1) + relu(y2);
% Linear + exp: 3 weights:
W1 = (0:2)*0.0001;
y4 = exp(y3 * W1);
% Linear: 3 weights:
W2 = [99700114.4299316;-199400468.100687;99700353.6313757];
y5 = y4 * W2;
% Linear : 2 weights:
p = [0.25,-0.25]*y5;
end
Try it online!
$endgroup$
add a comment |
$begingroup$
21 13 weights
This is based on the polarization identity of bilinear forms which in the one dimensional real case reduces to the polynomial identity:
$$ xcdot y = frac(x+y)^2 - (x-y)^24$$
So y1 just computes [x+y, x-y] using a linear transformation, and y3 is just the absolute value of y1 as a preprocessing step for the next one: Then the "hard" part is computing the squares which I will explain below, and after that just computing a difference and scaling which is again a linear operation.
To compute the squares I use an exponential series $s$ which should be accurate for all integers $0,1,2,ldots,20$ within around $0.5$. This series is of the form
$$ textapprox_square(x) = sum_i=0^2 w_i exp(0.0001 cdot i cdot x)$$
where I just optimized for the weights W2 ($=(w_i)_i$). This whole approximation comprises again just two linear transformations with an exponential activation sandwiched in between. This approach results in a maximal deviation of about 0.02.
function p = net(x)
relu = @(x)max(0,x);
% Linear: 4 weights
y1 = [1,1;1,-1]*x;
% Linear + ReLu + Sum: 1 weight (the -1):
y2 = -1 * y1;
y3 = relu(y1) + relu(y2);
% Linear + exp: 3 weights:
W1 = (0:2)*0.0001;
y4 = exp(y3 * W1);
% Linear: 3 weights:
W2 = [99700114.4299316;-199400468.100687;99700353.6313757];
y5 = y4 * W2;
% Linear : 2 weights:
p = [0.25,-0.25]*y5;
end
Try it online!
$endgroup$
21 13 weights
This is based on the polarization identity of bilinear forms which in the one dimensional real case reduces to the polynomial identity:
$$ xcdot y = frac(x+y)^2 - (x-y)^24$$
So y1 just computes [x+y, x-y] using a linear transformation, and y3 is just the absolute value of y1 as a preprocessing step for the next one: Then the "hard" part is computing the squares which I will explain below, and after that just computing a difference and scaling which is again a linear operation.
To compute the squares I use an exponential series $s$ which should be accurate for all integers $0,1,2,ldots,20$ within around $0.5$. This series is of the form
$$ textapprox_square(x) = sum_i=0^2 w_i exp(0.0001 cdot i cdot x)$$
where I just optimized for the weights W2 ($=(w_i)_i$). This whole approximation comprises again just two linear transformations with an exponential activation sandwiched in between. This approach results in a maximal deviation of about 0.02.
function p = net(x)
relu = @(x)max(0,x);
% Linear: 4 weights
y1 = [1,1;1,-1]*x;
% Linear + ReLu + Sum: 1 weight (the -1):
y2 = -1 * y1;
y3 = relu(y1) + relu(y2);
% Linear + exp: 3 weights:
W1 = (0:2)*0.0001;
y4 = exp(y3 * W1);
% Linear: 3 weights:
W2 = [99700114.4299316;-199400468.100687;99700353.6313757];
y5 = y4 * W2;
% Linear : 2 weights:
p = [0.25,-0.25]*y5;
end
Try it online!
edited 55 mins ago
answered 1 hour ago
flawrflawr
28.1k6 gold badges73 silver badges198 bronze badges
28.1k6 gold badges73 silver badges198 bronze badges
add a comment |
add a comment |
Stefan Mesken is a new contributor. Be nice, and check out our Code of Conduct.
Stefan Mesken is a new contributor. Be nice, and check out our Code of Conduct.
Stefan Mesken is a new contributor. Be nice, and check out our Code of Conduct.
Stefan Mesken is a new contributor. Be nice, and check out our Code of Conduct.
If this is an answer to a challenge…
…Be sure to follow the challenge specification. However, please refrain from exploiting obvious loopholes. Answers abusing any of the standard loopholes are considered invalid. If you think a specification is unclear or underspecified, comment on the question instead.
…Try to optimize your score. For instance, answers to code-golf challenges should attempt to be as short as possible. You can always include a readable version of the code in addition to the competitive one.
Explanations of your answer make it more interesting to read and are very much encouraged.…Include a short header which indicates the language(s) of your code and its score, as defined by the challenge.
More generally…
…Please make sure to answer the question and provide sufficient detail.
…Avoid asking for help, clarification or responding to other answers (use comments instead).
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodegolf.stackexchange.com%2fquestions%2f187562%2fmachine-learning-golf-multiplication%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
$begingroup$
Welcome to the site! I think this challenge could benefit a good deal from a more robust definition of a neural network. There are a couple of things here 1) It would be very nice for you to state it in language that does not already imply knowledge of NNs 2) You really should list the activation functions in your post rather than link out to an external source (outside links can change or disappear).
$endgroup$
– Sriotchilism O'Zaic
9 hours ago
1
$begingroup$
Can we have constant offsets? e.g., for a node with a single input
x, compute its value as ax+b? And if yes, does that count as 1 or 2 weights?$endgroup$
– Grimy
7 hours ago
3
$begingroup$
Can we reuse weights/use convolutional layers? (I recommend removing the bonus, as it doesn't add anything to the challenge and just distracts from the main goal.) Are the weights supposed to be real or can they be complex?
$endgroup$
– flawr
7 hours ago
2
$begingroup$
Your wording implies nodes from layer 3 can't use inputs from layer 1. Does it cost a weight to have a layer 2 node simply doing
f(x) = xto forward its input?$endgroup$
– Grimy
7 hours ago
1
$begingroup$
@Grimy You're correct. My wording doesn't correctly reflect my intentions. I'll edit that. Skipping layers is allowed and if you do so with one of the listed activation functions, it doesn't cost you any weights.
$endgroup$
– Stefan Mesken
7 hours ago