python - How to use pretrained Word2Vec model in Tensorflow -
i have word2vec
model trained in gensim
. how can use in tensorflow
word embeddings
. don't want train embeddings scratch in tensorflow. can tell me how example code?
let's assume have dictionary , inverse_dict list, index in list corresponding common words:
vocab = {'hello': 0, 'world': 2, 'neural':1, 'networks':3} inv_dict = ['hello', 'neural', 'world', 'networks']
notice how inverse_dict index corresponds dictionary values. declare embedding matrix , values:
vocab_size = len(inv_dict) emb_size = 300 # or whatever size of embeddings embeddings = np.zeroes((vocab_size, emb_size)) gensim.models.keyedvectors import keyedvectors model = keyedvectors.load_word2vec_format('embeddings_file', binary=true) k, v in vocab.items(): embeddings[v] = model[k]
you've got embeddings matrix. good. let's assume want train on sample: x = ['hello', 'world']
. doesn't work our neural net. need integerize:
x_train = [] word in x: x_train.append(vocab[word]) # integerize x_train = np.array(x_train) # make numpy array
now go embedding our samples on-the-fly
x_model = tf.placeholder(tf.int32, shape=[none, input_size]) tf.device("/cpu:0"): embedded_x = tf.nn.embedding_lookup(embeddings, x_model)
now embedded_x
goes convolution or whatever. assuming not retraining embeddings, using them. hope helps
Comments
Post a Comment