sgnlp.models.ufd.utils.create_train_embeddings¶
-
create_train_embeddings
(cfg: sgnlp.models.ufd.data_class.UFDArguments, tokenizer: sgnlp.models.ufd.tokenization.UFDTokenizer, model: sgnlp.models.ufd.modeling.UFDEmbeddingModel) → Dict[source]¶ Helper function to generate training dataset for supervised and unsupervised training.
- Parameters
cfg (UFDArguments) – UFDArguments config load from configuration file
tokenizer (UFDTokenizer) – UFD tokenizer class instance
model (UFDEmbeddingModel) – UFD embedding model class instance
- Returns
dictionary of dataset embeddings for supervised and unsupervised dataset
- Return type
Dict