Deep Learning is a branch of Machine Learning. It is just a fancy name for Neural Networks.
Neural Networks is a machine learning technique that is inspired by how computation happens in our brains.
Machine learning can be characterized as learning to make predictions based on past observations.
Given large set of data, a traditional machine learning model will perform single transformation on data by mapping input-output and reducing the loss function.
Deep learning approaches work by learning to not only predict but also to correctly represent the data, such that it is suitable for prediction.
Deep Learning approaches work by feeding the data into a network that produces successive transformations on the input data until a final transformation predicts the output.
Deep Learning in Natural Language Processing
Deep Learning provides powerful machine learning techniques that can be used Natural Language problems.
A major component in neural networks for language is the use of an embedding layer.
Embedding Layer : A mapping of discrete symbols to continuous vectors in a relatively low dimensional space.
When embedding words, they transform from being isolated distinct symbols into mathematical objects that can be operated on. In particular, distance between vectors can be equated to distance between words, making it easier to generalize the behavior from one word to another. This representation of words as vectors is learned by the network as part of the training process. Going up the hierarchy, the network also learns to combine word vectors in a way that is useful for prediction 
Neural Network Architectures
Feed Forward Neural Network: Feed forward networks, in particular multi-layer perceptrons, allow to work with fixed sized inputs, or with variable length inputs in which we can disregard the order of the elements. When feeding the network with a set of input components, it learns to combine them in a meaningful way. Multi-layer perceptrons can be used whenever a linear model was previously used. The non-linearity of the network, as well as the ability to easily integrate pre-trained word embeddings, often lead to superior classification accuracy.
Convolutional feed forward networks: Convolutional feed forward networks are specialized architectures that excel at extracting local patterns in the data: they are fed arbitrarily-sized inputs, and are capable of extracting meaningful local patterns that are sensitive to word order, regardless of where they appear in the input. These work very well for identify
Recurrent Neural Network: Recurrent neural networks are specialized models for sequential data. These are network components that take as input a sequence of items, and produce a fixed size vector that summarizes that sequence. As “summarizing a sequence” means different things for different tasks (i.e., the information needed to answer a question about the sentiment of a sentence is different from the information needed to answer a question about its grammatically), recurrent networks are rarely used as standalone component
: Neural Network Methods in Natural Language Processing by Yoav Goldberg.