A.) The algorithm can only learn patterns from the data, so if you want it to align to a specific functional form of language, it would make sense to curate your data to only contain sentences of that very structure.
If you tune your hyper-parameters such that the model is not as complex as the default, it should work for datasets that are smaller. But this relies on the observation that the thing you are trying to learn isn't as complex (as opposed to learning from a large corpus of Nietzsche).
B.) What is happening is that you are overfitting the data, such that the LSTM isn't generalizing to your intended goal. In essence, overfitting means that your model is learning irrelevant details that by chance happen to predict the intended goal in the training data. To alleviate this, use a validation set.
And finally (but very importantly), these machine learning techniques don't learn the same way as humans do. To understand why you'd need to learn the math behind NNs, and I can't think of an intuitive explanation as to why.