Converting transformer to predict time series code from Transformers = "~0.1.15" to Transformers = "~0.2.8" #196

marthinkondjeni · 2024-09-22T12:37:33Z

Hi all, Anyone to help me in converting the Use transformer to predict time series code GitHub - iwasnothing/julia_transformer_ts: Time series prediction using transformer in Julia from Transformers = “~0.1.15” to Transformers = “~0.2.8”.

chengchingwen · 2024-09-24T17:32:22Z

The code seems to be modified from the old Transformers tutorial, you can compare it with the newer code in the tutorial

marthinkondjeni · 2024-09-25T21:40:47Z

Based on the tutorial the encode function in the preprocess(sample) = todevice(encode(textenc, sample[1], sample[2])) function encode a batch of segments with the text encoder for general transformers. How do I encode an embedding vector from Flux.Embedding() for general transformers

chengchingwen · 2024-09-26T02:30:00Z

Can you elaborate? The preprocess(sample) is for handling String data (tokenization + one-hot), but you usually don't do that for time series. As for the transformers, it shouldn't change no matter the embedding vectors are from Flux.Embedding.

marthinkondjeni · 2024-09-26T10:54:05Z

Assume I have source and target Array{Float64}

src = [101.959 101.979 102.492 102.892 103.333 … 49.408 50.0367 50.7113 51.5407]
trg = [51.5407 52.3407 53.2327 54.2827 … 19.9873 20.0787 20.1707 20.204 20.2]

how can I process the above for the loss function in the tutorial as shown below

function loss(input)
    enc = encoder_forward(input.encoder_input)
    logits = decoder_forward(input.decoder_input, enc)
    ce_loss = shift_decode_loss(logits, input.decoder_input.token, input.decoder_input.attention_mask)
    return ce_loss
end ```

chengchingwen · 2024-09-26T23:56:47Z

You can the sample in a nested-NamedTuple. Since your data is a sequence of float numbers, I think there is no need for an embedding layer.

There are other things to tweak, but in general:

Given a batch of samples and preprocess them to have the shape (1, sequence length, batch size). Then source sequence src = seq[:, begin:end-1, :] and the target sequence trg = seq[:, begin+1:end, :].
Since there is no embedding layer, you need a feature layer (could be Dense or Conv) to map samples of shape (1, sequence length, batch size) to features of shape (feature dims, sequence length, batch size), like the encoder_input_layer in the time series notebook. Apply the feature layer the both src_feature = feature_layer(src) and trg_feature = feature_layer(trg).
Put the processed samples in the nested-NamedTuple like input = (encoder_input = (hidden_state = src_feature,), (decoder_input = (hidden_state = trg_feature,)).
Since the data are numbers, you need to replace the shift_decode_loss (which is a modified cross entropy loss) to something like the mean square error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting transformer to predict time series code from Transformers = "~0.1.15" to Transformers = "~0.2.8" #196

Converting transformer to predict time series code from Transformers = "~0.1.15" to Transformers = "~0.2.8" #196

marthinkondjeni commented Sep 22, 2024

chengchingwen commented Sep 24, 2024

marthinkondjeni commented Sep 25, 2024

chengchingwen commented Sep 26, 2024

marthinkondjeni commented Sep 26, 2024

chengchingwen commented Sep 26, 2024

Converting transformer to predict time series code from Transformers = "~0.1.15" to Transformers = "~0.2.8" #196

Converting transformer to predict time series code from Transformers = "~0.1.15" to Transformers = "~0.2.8" #196

Comments

marthinkondjeni commented Sep 22, 2024

chengchingwen commented Sep 24, 2024

marthinkondjeni commented Sep 25, 2024

chengchingwen commented Sep 26, 2024

marthinkondjeni commented Sep 26, 2024

chengchingwen commented Sep 26, 2024