Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting transformer to predict time series code from Transformers = "~0.1.15" to Transformers = "~0.2.8" #196

Open
marthinkondjeni opened this issue Sep 22, 2024 · 5 comments

Comments

@marthinkondjeni
Copy link

Hi all, Anyone to help me in converting the Use transformer to predict time series code GitHub - iwasnothing/julia_transformer_ts: Time series prediction using transformer in Julia from Transformers = “~0.1.15” to Transformers = “~0.2.8”.

@chengchingwen
Copy link
Owner

The code seems to be modified from the old Transformers tutorial, you can compare it with the newer code in the tutorial

@marthinkondjeni
Copy link
Author

Based on the tutorial the encode function in the preprocess(sample) = todevice(encode(textenc, sample[1], sample[2])) function encode a batch of segments with the text encoder for general transformers. How do I encode an embedding vector from Flux.Embedding() for general transformers

@chengchingwen
Copy link
Owner

Can you elaborate? The preprocess(sample) is for handling String data (tokenization + one-hot), but you usually don't do that for time series. As for the transformers, it shouldn't change no matter the embedding vectors are from Flux.Embedding.

@marthinkondjeni
Copy link
Author

Assume I have source and target Array{Float64}

src = [101.959 101.979 102.492 102.892 103.333 … 49.408 50.0367 50.7113 51.5407]
trg = [51.5407 52.3407 53.2327 54.2827 … 19.9873 20.0787 20.1707 20.204 20.2]

how can I process the above for the loss function in the tutorial as shown below

function loss(input)
    enc = encoder_forward(input.encoder_input)
    logits = decoder_forward(input.decoder_input, enc)
    ce_loss = shift_decode_loss(logits, input.decoder_input.token, input.decoder_input.attention_mask)
    return ce_loss
end ```




@chengchingwen
Copy link
Owner

You can the sample in a nested-NamedTuple. Since your data is a sequence of float numbers, I think there is no need for an embedding layer.

There are other things to tweak, but in general:

  1. Given a batch of samples and preprocess them to have the shape (1, sequence length, batch size). Then source sequence src = seq[:, begin:end-1, :] and the target sequence trg = seq[:, begin+1:end, :].
  2. Since there is no embedding layer, you need a feature layer (could be Dense or Conv) to map samples of shape (1, sequence length, batch size) to features of shape (feature dims, sequence length, batch size), like the encoder_input_layer in the time series notebook. Apply the feature layer the both src_feature = feature_layer(src) and trg_feature = feature_layer(trg).
  3. Put the processed samples in the nested-NamedTuple like input = (encoder_input = (hidden_state = src_feature,), (decoder_input = (hidden_state = trg_feature,)).
  4. Since the data are numbers, you need to replace the shift_decode_loss (which is a modified cross entropy loss) to something like the mean square error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants