Alpha-R1

Alpha-R1 is a reasoning-enhanced Large Language Model for quantitative alpha selection, trained with Group Relative Policy Optimization(GRPO) on top of Qwen3-8B.

It is the official implementation accompanying the paper:

Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning


Overview

Alpha-R1 is designed for Alpha Screening rather than general-purpose conversation.

Unlike conventional LLMs, Alpha-R1 learns to reason over

  • Financial factor descriptions
  • Historical price trends
  • Market news
  • Portfolio constraints

to produce interpretable alpha-selection decisions.

The model is optimized using reinforcement learning with trading performance as the optimization objective, enabling stronger reasoning ability for quantitative investment tasks.


Highlights

  • 🧠 Reinforcement-learning aligned financial reasoning
  • 📈 Multi-modal market understanding (news + quantitative factors + price)
  • 📊 Strong generalization across different asset pools
  • 💰 Optimized for alpha generation instead of language modeling

Alpha-R1 achieves:

Dataset Annual Return Sharpe Max Drawdown
CSI300 27.59% 1.62 6.76%
CSI1000 78.18% 4.03 9.25%

Model Details

Item Value
Base Model Qwen3-8B
Model Type Causal Language Model
Training Reinforcement Learning Fine-tuning
Domain Quantitative Finance
Language English \ Chinese
Intended Task Alpha Selection & Financial Reasoning

Training

Alpha-R1 is initialized from Qwen3-8B and further optimized using GRPO.

The training objective encourages the model to generate reasoning trajectories that maximize downstream portfolio performance instead of only predicting next tokens.

The model is trained using:

  • Financial factor descriptions
  • Historical price information
  • Market news
  • Trading rewards derived from portfolio returns

More details can be found in the accompanying paper.

Training Data

Alpha-R1 was trained on a proprietary financial reasoning dataset constructed by the authors. Rather than relying on an existing benchmark, the training data was generated through a multi-stage pipeline that integrates quantitative market information with semantic reasoning.

The data construction process consists of the following stages:

  1. Market Data Abstraction. Historical market observations were transformed into structured textual descriptions, including price-based market summaries derived from technical indicators, trading activity, and sector rotation, as well as news-based market summaries generated from financial news and macroeconomic events.

  2. Iterative Market Memory Construction. Weekly market descriptions were recursively summarized by an LLM to build a long-term historical market memory, enabling the model to reason over evolving market regimes instead of isolated daily observations.

  3. Factor Profiling. A dynamic factor zoo was constructed from computationally feasible Alpha101 factors. Each factor was systematically backtested over historical data to obtain quantitative performance statistics, including return, volatility, and decay characteristics. These statistics, together with the historical market memory, were used to generate semantic factor descriptions that explain the economic intuition, applicable market regimes, and potential limitations of each factor.

  4. Reasoning Training Samples. Each training sample contains:

    • the current market state,
    • semantic descriptions of candidate factors,
    • historical market memory,
    • and the corresponding factor candidates to be screened.

    During training, candidate factors were randomly sampled from the full factor pool to encourage reasoning and generalization rather than memorization of specific factors.

  5. Reinforcement Learning Signals. Instead of human preference annotations, Alpha-R1 employs objective market feedback as supervision. Rewards are computed from realized portfolio performance using a linear reward model, allowing the reasoning policy to be optimized through GRPO toward superior risk-adjusted investment performance.

The training dataset is internally constructed for research purposes and is not publicly released.

Intended Use

Alpha-R1 is intended solely for alpha screening in quantitative investment research. Given the current market state, historical market memory, and semantic descriptions of candidate alpha factors, the model identifies factors that are more likely to generate excess returns under the prevailing market regime.

The model is designed as a research tool for factor selection and should be used together with downstream portfolio construction, risk management, and execution systems. It does not generate trading signals, execute trades, provide investment advice, or manage portfolios autonomously.


Usage

Alpha-R1 is designed for alpha screening in quantitative investment research rather than general-purpose conversation.

Given:

  • Current market conditions
  • Historical market memory
  • Candidate alpha factor descriptions
  • Asset universe information

the model reasons about factor effectiveness under the prevailing market regime and selects factors that are more likely to generate excess returns.

Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "FinStep-AI/Alpha-R1"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

model.eval()

system_prompt = """
You are a senior quantitative investment expert, skilled in selecting
the most suitable alpha factor combinations based on market environment
and asset characteristics.

You need to analyze current market conditions, the characteristics of
each factor, and asset portfolio situations to provide scientific and
reasonable factor selection recommendations.
"""

user_prompt = """
Based on the following information, select the most suitable factor
combinations for {target_date}'s trading day for a {holding_days}-day
short-term strategy stock selection
(buy at market open, sell at market close after {holding_days} trading days).

Target Date:
{target_date}

Market Environment Information:

• Previous Trading Day Closing Data ({previous_trading_day}):
{market_price_data}

• Previous Trading Day Market Analysis ({previous_trading_day}):
{market_analysis}

• Current Day Pre-Market News ({target_date}):
{financial_news}

Available Factor Descriptions:

• {factor_1_name}:
{factor_1_description}

• {factor_2_name}:
{factor_2_description}

• {factor_3_name}:
{factor_3_description}

...

Asset Portfolio Information:

{asset_pool_information}

Analysis Framework:

1. Analyze each factor's nature and characteristics.
2. Evaluate factors' expected performance in the current market.
3. Consider portfolio characteristics for further screening.
4. Make the final selection (maximum 10 factors).

Output Requirements:

• Provide detailed analytical reasoning first.
• Output XML-tagged factor list:
  <alpha_list><alpha001>...</alpha_list>
• Maximum 10 factors allowed.
• Skip selection if no factors are expected to yield positive returns.
"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(
    text,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=4096,
        temperature=0.6,
        top_p=0.95,
        pad_token_id=tokenizer.eos_token_id,
    )

response = tokenizer.decode(
    outputs[0][inputs.input_ids.shape[1]:],
    skip_special_tokens=True,
)

print(response)

Expected Output

Factor analysis reasoning: [Detailed explanation of selection logic...]
The most suitable factor selection for the current market is: <alpha_list><alpha001><alpha003><alpha007></alpha_list>

Generation Recommendations

generation_config = {
    "temperature": 0.6,
    "top_p": 0.95,
    "max_new_tokens": 4096,
}

For reproducible factor-screening results, we recommend:

temperature=0

which is also consistent with the evaluation setting reported in the paper.

Limitations

  • This model is not a financial advisor.
  • Outputs should not be regarded as investment advice.
  • Performance reported in the paper is obtained under a specific backtesting protocol and does not guarantee future returns.
  • Users should perform their own validation before any real-world deployment.

Citation

If you use Alpha-R1 in your research, please cite our paper:

@article{jiang2025alphar1,
  title={Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning},
  author={Jiang, Zuoyou and Zhao, Li and Sun, Rui and Sun, Ruohan and Li, Zhongjian and Li, Jing and Jiang, Daxin and Bai, Zuo and Hua, Cheng},
  journal={arXiv preprint arXiv:2512.23515},
  year={2025}
}

License

This model is released under the Apache-2.0 License.

Please also comply with the license of the base model (Qwen3-8B) when using this model.


Acknowledgements

Alpha-R1 is built upon the excellent Qwen3-8B model developed by Alibaba.

We thank the open-source community for making this work possible.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AFatRat/Alpha-R1

Finetuned
Qwen/Qwen3-8B
Finetuned
(1771)
this model

Paper for AFatRat/Alpha-R1