Same model performs very diferent in Keras and Flux

Question

Same model performs very diferent in Keras and Flux

177 views Asked by Luis.Alberto At 15 May 2023 at 09:31

In a class I'm taking, the professor gave us two datasets, one of 301 late-type galaxies and the other one of 301 early-type galaxies, and we build a model in Keras so it can differentiate them:

input_img = Input(shape=(128,128,3))

x = Conv2D(filters = 16, kernel_size= (3,3), strides = (1,1), activation='relu', padding = 'same')(input_img)
x = MaxPooling2D((2,2),padding = 'same')(x)

x = Conv2D(filters = 32, kernel_size= (3,3), strides = (1,1), activation='relu', padding = 'same')(x)
x = MaxPooling2D((2,2),padding = 'same')(x)

x = Conv2D(filters = 64, kernel_size= (3,3), strides = (1,1), activation='relu', padding = 'same')(x)
x = MaxPooling2D((2,2),padding = 'same')(x)

x = Flatten()(x)
x = Dense(32, activation = 'relu')(x)
x = Dropout(0.3)(x)
x = Dense(16, activation = 'relu')(x)
out = Dense(1, activation = 'sigmoid')(x)

model = Model(inputs = input_img, outputs = out)
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
history = model.fit(X_train, Y_train, batch_size = 32, epochs = 20)

Since I like Julia more than Python, I tried to build the same model in Flux.jl and according to what I read in the Flux Docs this is what the Flux model looks like:

model2 = Chain(            
    Conv((3, 3), 3 => 16, relu, pad=SamePad(), stride=(1, 1)),
    MaxPool((2,2), pad=SamePad()),
    Conv((3, 3), 16 => 32, relu, pad=SamePad(), stride=(1, 1)),
    MaxPool((2,2), pad=SamePad()),
    Conv((3, 3), 32 => 64, relu, pad=SamePad(), stride=(1, 1)),
    MaxPool((2,2), pad=SamePad()),
    Flux.flatten,
    Dense(16384 => 32, relu),
    Dense(32 => 16, relu),

    Dense(16 => 1),
    sigmoid
)

But when I train the models in what I think are the same conditions, I get very different results. In Keras the final lost after 20 Epochs is loss: 0.0267 and in Flux after 30 Epochs the loss is 0.4082335f0, so I don't know where this difference in loss could come from since I'm using the same batch size in both models and the data treatment is the same (I think). Python:

X1 = np.load('/home/luis/Descargas/cosmo-late.npy')
X2 = np.load('/home/luis/Descargas/cosmo-early.npy')
X = np.concatenate((X1,X2), axis = 0).astype(np.float32)/256.0
Y = np.zeros(X.shape[0])
Y[0:len(X1)] = 1
rand_ind = np.arange(0,X.shape[0])
np.random.shuffle(rand_ind)
X = X[rand_ind]
Y = Y[rand_ind]
X_train = X[50:]
Y_train = Y[50:]
X_test = X[0:50]
Y_test = Y[0:50]

Julia:

X1 = npzread("./Descargas/cosmo-late.npy")
X2 = npzread("./Descargas/cosmo-early.npy")
X = cat(X1,X2,dims=1)
X = Float32.(X)./256
Y = zeros(1,size(X)[1])
Y[1,1:length(X1[:,1,1,1])] .= 1
ind = collect(1:length(Y[1,:]))
shuffle!(ind)
X = X[ind,:,:,:]
Y = Y[:,ind]
X_train = X[51:length(X[:,1,1,1]),:,:,:]
Y_train = Y[:,51:length(Y)]
X_test = X[1:50,:,:,:]
Y_test = Y[:,1:50]
X_train = permutedims(X_train, (2, 3, 4, 1))
X_test = permutedims(X_test, (2, 3, 4, 1))

And the training in Julia goes:

train_set = Flux.DataLoader((X_train, Y_train), batchsize=32)
loss(x, y) = Flux.logitbinarycrossentropy(x, y)
opt = Flux.setup(Adam(), model2)
loss_history = Float32[]

for epoch = 1:30
    Flux.train!(model2, train_set, opt) do m,x,y
        err = loss(m(x), y)
        ChainRules.ignore_derivatives() do
            push!(loss_history, err)
        end
        return err
    end
end

Can anyone pls help me, I can't figure it out.

Original Q&A

There are 2 answers

Hermione On 21 May 2023 at 01:47

after viewing your code, I don't think this model weight could be update in julia,

the model should be contained in your loss code. here is an example for how to set your loss func.

loss3(model, x, y) = norm(model(x) .- y)        # the model is the first argument

PS. and there is also a simple syntax of Flux train:

train!(loss, model, data, opt_state)

hope these help, and above code from the help of ?Flux.train! in julia.

**Albin Heimerson** · Accepted Answer · 2023-05-15T19:12:45+00:00

Based on my comment about skipping sigmoid when using logitbinarycrossentropy, I had a quick go at testing this for some scrap data, and with your current implementation I also ended up at 0.5 ish loss, while after removing the sigmoid I reached much lower values.

You can also choose to keep the sigmoid and use binarycrossentropy instead, though it seems that is not as numerically stable so it is better to do it with logitbinarycrossentropy.

TechQA.

Same model performs very diferent in Keras and Flux

There are 2 answers

Related Questions in MACHINE-LEARNING

Related Questions in KERAS

Related Questions in JULIA

Related Questions in FLUX.JL

Popular Questions

Trending Questions