data efficient LLM reasoning training