Kerasでカテゴリ変数をEntity Embeddingする時の次元数




    input_dim, output_dim, embeddings_initializer='uniform',
    embeddings_regularizer=None, activity_regularizer=None,
    embeddings_constraint=None, mask_zero=False, input_length=None, **kwargs

input_dim  int > 0. Size of the vocabulary, i.e. maximum integer index + 1.

output_dim  int >= 0. Dimension of the dense embedding.



Why is the embedding vector size 3 in our example? Well, the following “formula” provides a general rule of thumb about the number of embedding dimensions:

embedding_dimensions = number_of_categories**0.25

That is, the embedding vector dimension should be the 4th root of the number of categories. Since our vocabulary size in this example is 81, the recommended number of dimensions is 3:

3 = 81**0.25

Note that this is just a general guideline; you can set the number of embedding dimensions as you please.

The dimensions of the embedding layers Di are hyperparameters that need to be pre-defined.

The bound of the dimensions of entity embeddings are between 1 and mi − 1 where mi is the number of values for the categorical variable xi .

In practice we chose the dimensions based on experiments. The following empirical guidelines are used during this process:

First, the more complex the more dimensions.

We roughly estimated how many features/aspects one might need to describe the entities and used that as the dimension to start with.

Second, if we had no clue about the first guideline, then we started with mi − 1.






tf.keras.layers.Embedding  |  TensorFlow v2.15.0.post1
Turns positive integers (indexes) into dense vectors of fixed size.
3 Ways to Encode Categorical Variables for Deep Learning -
Machine learning and deep learning models, like those in Keras, require all input and output variables to be numeric. Th...
Introducing TensorFlow Feature Columns
data:blog.pageName + " Read more."
Entity Embeddings of Categorical Variables
We map categorical variables in a function approximation problem into Euclidean spaces, which are the entity embeddings ...