Skip to content

自己设计的和标准的可能差的挺多, 仿真也不是特别完善, 导致可能有些 Bug 是没有发现但真实存在的

SCPU

Lab 4-1

操作方法与实验步骤

代码设计层次结构图及说明

alt text

SCPU 模块由 DataPath 和 SCPU_Ctrl 组成,其中 SCPU_Ctrl 模块以 inst[6:2], inst[14:12], inst[30] 作为输入,分别表示 OPcode, Fun3, Fun7

DataPath 模块以 SCPU_Ctrl 的各个控制信号为输入,输出 ALU 得到的结果,输出到 RAM 的数据,以及程序指针 PC 的值

整体 SCPU 模块输入时钟信号 clk , 指令 inst 和 重置信号 rst 等,输出单周期 CPU 运行后的结果

源代码

SCPU代码:

module SCPU(
    input clk,
    input rst,
    input MIO_ready,
    input [31:0]inst_in,
    input [31:0]Data_in,
    output CPU_MIO,
    output MemRW,
    output [31:0]PC_out,
    output [31:0]Data_out,
    output [31:0]Addr_out,
    output [2:0] ALU_Control
);

wire [1:0] ImmSel;
wire ALUSrc_B;
wire [1:0] MemtoReg;   
wire Jump;
wire Branch;
wire RegWrite;
SCPU_ctrl Ctrl(.OPcode(inst_in[6:2]), .Fun3(inst_in[14:12]), .Fun7(inst_in[30]), 
    .MIO_ready(MIO_ready), .ImmSel(ImmSel), .ALUSrc_B(ALUSrc_B), .MemtoReg(MemtoReg), 
    .Jump(Jump), .Branch(Branch), .RegWrite(RegWrite), .MemRW(MemRW), 
    .ALU_Control(ALU_Control), .CPU_MIO(CPU_MIO));

DataPath DP(.ImmSel(ImmSel), .ALUSrc_B(ALUSrc_B), .MemtoReg(MemtoReg), 
    .Jump(Jump), .Branch(Branch), .RegWrite(RegWrite), .ALU_Control(ALU_Control),
    .Data_in(Data_in), .clk(clk), .inst_field(inst_in), .rst(rst), 
    .ALU_out(Addr_out), .Data_out(Data_out), .PC_out(PC_out));

endmodule

在主体代码中进行了连线,将 SCPU_Ctrl 模块和 DataPath 模块相连接,并接好对应的输入输出

Lab 4-2

操作方法与实验步骤

代码设计层次结构图及说明

DataPath 的结构图如下:

alt text

DataPath 模块由以下子模块组成:RegFile, ALU, ImmGen

其中,RegFile 和 ALU 在 Lab 1 中完成,ImmGen 在本次实验中完成

ImmGen 模块的作用是,对于 SCPU 产生的 ImmSel 信号,根据 inst 产生对应的立即数,并送给后面的 ALU、RegFile 等模块

SCPU_Ctrl 模块的作用是,根据 inst 识别出指令格式,并根据不同指令产生对应的控制信号,送给 DataPath

源代码

1. ImmGen 代码:

module ImmGen(
  input  [1:0]   ImmSel,     
  input  [31:0]  inst_field, 
  output reg [31:0] Imm_out
);

  always @(*) begin
    case (ImmSel)
      2'b00: // I-type 
        Imm_out = {{20{inst_field[31]}}, inst_field[31:20]}; 

      2'b01: // S-type
        Imm_out = {{20{inst_field[31]}}, inst_field[31:25], inst_field[11:7]}; //

      2'b10: // B-type 
        Imm_out = {{19{inst_field[31]}}, inst_field[31], inst_field[7], 
        inst_field[30:25], inst_field[11:8], 1'b0}; 

      2'b11: // J-type 
        Imm_out = {{11{inst_field[31]}}, inst_field[31], inst_field[19:12], 
        inst_field[20], inst_field[30:21], 1'b0}; // 

    endcase
  end
endmodule

对于 I 型指令,将 inst[31:20] 做符号拓展即可得到最终立即数;

对于 S 型指令,将 inst[31:25],inst[11:7] 合并后做符号拓展即可得到最终立即数;

对于 B, J 型指令,将打乱的立即数部分重新拼接,合并后做符号拓展即可得到最终立即数;

2. SCPU_Ctrl 代码:

module SCPU_ctrls(
  input [4:0]       OPcode, 
  input [2:0]       Fun3,
  input             Fun7,
  input             MIO_ready,
  output reg [1:0]  ImmSel,
  output reg        ALUSrc_B,
  output reg [1:0]  MemtoReg,
  output reg        Jump,
  output reg        Branch,
  output reg        RegWrite,
  output reg        MemRW,
  output reg [3:0]  ALU_Control,
  output reg        CPU_MIO
);
initial begin
  ImmSel    = 2'b00;
  ALUSrc_B  = 1'b0;
  MemtoReg  = 2'b00;// 0: ALU result 1: Load from RAM to reg  2/3: PC4, JAL
  Jump      = 1'b0;
  Branch    = 1'b0;
  RegWrite  = 1'b0;
  MemRW     = 1'b0;// write to / read from RAM
  ALU_Control = 4'b0000;
end
always @(*) begin

  case (OPcode)
    5'b01100: begin // R-type 
      ALUSrc_B = 1'b0;
      MemtoReg = 2'b00;
      RegWrite = 1'b1;
      Jump     = 1'b0;
      Branch   = 1'b0;

      case ({Fun3, Fun7})
        4'b0000: ALU_Control = 4'b0000; // ADD
        4'b0001: ALU_Control = 4'b0001; // SUB
        4'b0010: ALU_Control = 4'b0010; // SLL
        4'b0100: ALU_Control = 4'b0011; // SLT
        4'b0110: ALU_Control = 4'b0100; // SLTU
        4'b1000: ALU_Control = 4'b0101; // XOR
        4'b1010: ALU_Control = 4'b0110; // SRL
        4'b1011: ALU_Control = 4'b0111; // SRA
        4'b1100: ALU_Control = 4'b1000; // OR
        4'b1110: ALU_Control = 4'b1001; // AND
      endcase
    end
    5'b00000: begin // Load 
      ALUSrc_B  = 1'b1;
      MemtoReg  = 2'b01;
      RegWrite  = 1'b1;
      ImmSel    = 2'b00;
      MemRW     = 1'b0; // read
      Jump      = 1'b0;
      Branch    = 1'b0;
      ALU_Control = 4'b0000; // ADD for address calculation
    end
    5'b01000: begin // Store 
      ALUSrc_B  = 1'b1;
      MemtoReg = 2'bx;
      RegWrite  = 1'b0;
      ImmSel    = 2'b01;
      MemRW     = 1'b1; // write
      Jump      = 1'b0;
      Branch    = 1'b0; 
      ALU_Control = 4'b0000; // ADD for address calculation
    end
    5'b11000: begin // Branch
      ALUSrc_B  = 1'b0;
      MemtoReg  = 2'bx; // new
      RegWrite  = 1'b0; //
      ImmSel    = 2'b10; 
      Branch    = 1'b1;
      Jump      = 1'b0;
      ALU_Control = 4'b0001; // SUB for branch condition check
    end
    5'b11011: begin // JAL 
      ALUSrc_B = 1'bx;
      MemtoReg = 2'b10; // PC + 4
      RegWrite = 1'b1; //
      ImmSel   = 2'b11;
      Branch   = 1'b0;
      Jump     = 1'b1;
    end
    5'b00100: begin // I-type ALU
      ALUSrc_B  = 1'b1;
      MemtoReg  = 2'b00;
      RegWrite  = 1'b1;
      ImmSel    = 2'b00;
      Jump      = 1'b0;
      Branch    = 1'b0;
      case (Fun3)
        3'b000: ALU_Control = 4'b0000; // ADDI
        3'b001: ALU_Control = 4'b0010; // SLLI
        3'b010: ALU_Control = 4'b0011; // SLTI
        3'b011: ALU_Control = 4'b0100; // SLTIU
        3'b100: ALU_Control = 4'b0101; // XORI
        3'b101: ALU_Control = (Fun7) ? 4'b0111 : 4'b0110; // SRAI / SRLI
        3'b110: ALU_Control = 4'b1000; // ORI
        3'b111: ALU_Control = 4'b1001; // ANDI
      endcase
    end
  endcase
end
endmodule

为了支持 R 型指令(ALU), 我们添加 ALUSrc_B,ALU_Control 信号,当 ALUSrc_B=0 时选择寄存器 rs2 作为 ALU 计算的输入 B ;

ALU_Control 选择不同值时,ALU 会做不同的基本运算,并将结果输出

ALU_Control 与 Lab 1 中的 ALU 的操作值一一对应

同时添加 RegWrite 信号,控制 RegFile 写使能,来将 ALU 结果存到寄存器中

同时添加 MemtoReg 信号,控制写入寄存器的值, MemtoReg=0 表示将 ALU 结果写入寄存器

为了支持 I 型指令( Ld 和立即数 ALU 操作),添加 ImmSel 信号,输入到 ImmGen 中,来选择不同类型的立即数;同时 ALUSrc_B=1 表示将立即数作为 ALU 第二个操作数 B

同时 Ld 指令中 MemtoReg=1 表示将内存中读取的值写入寄存器

为了支持 Sd 指令,添加 MemRW 信号, MemRW=1 控制内存写使能

为了支持 B,J 型指令,添加 Branch, Jump 信号,同时 MemtoReg=2 表示将 PC+4 写入寄存器

3. DataPath 代码:

module DataPaths(
    input [1:0] ImmSel, input ALUSrc_B, input [1:0] MemtoReg, input Jump, input Branch, input RegWrite, input [3:0] ALU_Control,
    input [31:0] Data_in, input clk, input [31:0] inst_field, input rst, 
    output [31:0] Reg00, output [31:0] Reg01, output [31:0] Reg02, output [31:0] Reg03,
    output [31:0] Reg04, output [31:0] Reg05, output [31:0] Reg06, output [31:0] Reg07,
    output [31:0] Reg08, output [31:0] Reg09, output [31:0] Reg10, output [31:0] Reg11,
    output [31:0] Reg12, output [31:0] Reg13, output [31:0] Reg14, output [31:0] Reg15,
    output [31:0] Reg16, output [31:0] Reg17, output [31:0] Reg18, output [31:0] Reg19,
    output [31:0] Reg20, output [31:0] Reg21, output [31:0] Reg22, output [31:0] Reg23,
    output [31:0] Reg24, output [31:0] Reg25, output [31:0] Reg26, output [31:0] Reg27,
    output [31:0] Reg28, output [31:0] Reg29, output [31:0] Reg30, output [31:0] Reg31,
    output [31:0] ALU_out, output [31:0] Data_out, output [31:0] PC_out);

wire and_2;
wire [31:0] and_2_out;
wire ALU_zero;
wire [31:0] add_0;
wire [31:0] ALU_A;
wire [31:0] ALU_B;
reg [31:0] reg_wt_data;
wire [31:0] PC4;
wire [31:0] PC_in;
wire [31:0] ImmOut;

assign PC4 = PC_out + 32'd4;
assign and_2 = Branch & ALU_zero;
assign add_0 = ImmOut + PC_out;


always @ (*) begin
    case (MemtoReg)
        2'd0: reg_wt_data = ALU_out; //
        2'd1: reg_wt_data = Data_in;
        2'd2: reg_wt_data = PC4;
        2'd3: reg_wt_data = PC4;
    endcase    
end

assign ALU_B = ALUSrc_B ? ImmOut : Data_out;
assign and_2_out = and_2 ? add_0 : PC4;
assign PC_in = Jump ? add_0 : and_2_out;

ALU alu(.A(ALU_A), .B(ALU_B), .ALU_operation(ALU_Control), .res(ALU_out), .zero(ALU_zero));
wire [4:0] RS1, RS2, WT;
assign RS1 = inst_field[19:15];
assign RS2 = inst_field[24:20];
assign WT = inst_field[11:7];
Regs regs(.clk(clk), .rst(rst), .Rs1_addr(RS1), .Rs2_addr(RS2), 
    .Wt_addr(WT), .Wt_data(reg_wt_data), .RegWrite(RegWrite), 
    .Rs1_data(ALU_A), .Rs2_data(Data_out),
    .Reg00(Reg00),.Reg01(Reg01),.Reg02(Reg02),.Reg03(Reg03),
    .Reg04(Reg04),.Reg05(Reg05),.Reg06(Reg06),.Reg07(Reg07),
    .Reg08(Reg08),.Reg09(Reg09),.Reg10(Reg10),.Reg11(Reg11),
    .Reg12(Reg12),.Reg13(Reg13),.Reg14(Reg14),.Reg15(Reg15),
    .Reg16(Reg16),.Reg17(Reg17),.Reg18(Reg18),.Reg19(Reg19),
    .Reg20(Reg20),.Reg21(Reg21),.Reg22(Reg22),.Reg23(Reg23),
    .Reg24(Reg24),.Reg25(Reg25),.Reg26(Reg26),.Reg27(Reg27),
    .Reg28(Reg28),.Reg29(Reg29),.Reg30(Reg30),.Reg31(Reg31));

ImmGen immgen(.ImmSel(ImmSel), .inst_field(inst_field), .Imm_out(ImmOut));

Reg PC(.clk(clk), .rst(rst), .CE(1'b1), .D(PC_in), .Q(PC_out));

endmodule

在 DataPath 代码中对 ImmGen, RegFile, ALU 等模块进行了实例化,将 SCPU_Ctrl 传入的控制信号用于控制不同模块的操作

4. SCPU代码:

module SCPU(
    input clk,
    input rst,
    input MIO_ready,
    input [31:0] inst_in,
    input [31:0] Data_in,
    output CPU_MIO,
    output MemRW,
    output [31:0] PC_out,
    output [31:0] Data_out,
    output [31:0] Addr_out,
    output [31:0] Reg00, output [31:0] Reg01,output [31:0] Reg02,output [31:0] Reg03,
    output [31:0] Reg04, output [31:0] Reg05,output [31:0] Reg06,output [31:0] Reg07,
    output [31:0] Reg08, output [31:0] Reg09,output [31:0] Reg10,output [31:0] Reg11,
    output [31:0] Reg12, output [31:0] Reg13,output [31:0] Reg14,output [31:0] Reg15,
    output [31:0] Reg16, output [31:0] Reg17,output [31:0] Reg18,output [31:0] Reg19,
    output [31:0] Reg20, output [31:0] Reg21,output [31:0] Reg22,output [31:0] Reg23,
    output [31:0] Reg24, output [31:0] Reg25,output [31:0] Reg26,output [31:0] Reg27,
    output [31:0] Reg28, output [31:0] Reg29,output [31:0] Reg30,output [31:0] Reg31,
    output [3:0] ALU_Control
);

wire [1:0] ImmSel;
wire ALUSrc_B;
wire [1:0] MemtoReg;   
wire Jump;
wire Branch;
wire RegWrite;
wire [4:0] OP;
wire [2:0] FUN3;
wire FUN7;
assign OP = inst_in[6:2];
assign FUN3 = inst_in[14:12];
assign FUN7 = inst_in[30];
SCPU_ctrls Ctrl(.OPcode(OP), .Fun3(FUN3), .Fun7(FUN7), 
    .MIO_ready(MIO_ready), .ImmSel(ImmSel), .ALUSrc_B(ALUSrc_B), .MemtoReg(MemtoReg), 
    .Jump(Jump), .Branch(Branch), .RegWrite(RegWrite), .MemRW(MemRW), .ALU_Control(ALU_Control),
    .CPU_MIO(CPU_MIO));

DataPaths DP(.ImmSel(ImmSel), .ALUSrc_B(ALUSrc_B), .MemtoReg(MemtoReg), 
    .Jump(Jump), .Branch(Branch), .RegWrite(RegWrite), .ALU_Control(ALU_Control),
    .Data_in(Data_in), .clk(clk), .inst_field(inst_in), .rst(rst), 
    .ALU_out(Addr_out), .Data_out(Data_out), .PC_out(PC_out),
    .Reg00(Reg00),.Reg01(Reg01),.Reg02(Reg02),.Reg03(Reg03),
    .Reg04(Reg04),.Reg05(Reg05),.Reg06(Reg06),.Reg07(Reg07),
    .Reg08(Reg08),.Reg09(Reg09),.Reg10(Reg10),.Reg11(Reg11),
    .Reg12(Reg12),.Reg13(Reg13),.Reg14(Reg14),.Reg15(Reg15),
    .Reg16(Reg16),.Reg17(Reg17),.Reg18(Reg18),.Reg19(Reg19),
    .Reg20(Reg20),.Reg21(Reg21),.Reg22(Reg22),.Reg23(Reg23),
    .Reg24(Reg24),.Reg25(Reg25),.Reg26(Reg26),.Reg27(Reg27),
    .Reg28(Reg28),.Reg29(Reg29),.Reg30(Reg30),.Reg31(Reg31));

endmodule

5. 其他模块代码:

ALU

module ALU(A, B, ALU_operation, res, zero, overflow );
input wire [31:0] A;
input wire [31:0] B;
input wire [3:0] ALU_operation;
output reg [31:0] res;
output wire zero;
output reg overflow;
wire [31:0] res_and,res_or,res_add,res_sub,
  res_xor,res_slt,res_sltu,res_sll,res_srl,res_sra;

wire [4:0] bit;
assign res_xor = A^B;
assign res_and = A&B;
assign res_or = A|B;

assign res_add = $signed($signed(A)+$signed(B));
assign res_sub = $signed($signed(A)-$signed(B));
assign res_slt = ($signed(A) < $signed(B)) ? 1 : 0;
assign res_sltu = ($unsigned(A) < $unsigned(B)) ? 1 : 0;

assign bit = $unsigned(B[4:0]);
assign res_sll = A<<B[4:0];
assign res_srl = A>>B[4:0];    
assign res_sra = $signed(A)>>>$signed(B[4:0]);
always @ (A or B or ALU_operation)begin
    case (ALU_operation)
    4'd0: begin res=res_add;overflow=(res<A);end
    4'd1: begin res=res_sub;overflow=(res>A);end
    4'd2: begin res=res_sll;overflow=0;end
    4'd3: begin res=res_slt;overflow=0;end
    4'd4: begin res=res_sltu;overflow=0;end
    4'd5: begin res=res_xor;overflow=0;end
    4'd6: begin res=res_srl;overflow=0;end
    4'd7: begin res=res_sra;overflow=0;end
    4'd8: begin res=res_or;overflow=0;end
    4'd9: begin res=res_and;overflow=0;end
    default: begin res=32'h0;overflow=0;end
    endcase    
end
assign zero = (res==0)? 1: 0;
endmodule

RegFile

module Regs(
    input wire clk,
    input wire rst,
    input wire [4:0] Rs1_addr, 
    input wire [4:0] Rs2_addr, 
    input wire [4:0] Wt_addr, 
    input wire [31:0] Wt_data, 
    input wire RegWrite, 
    output wire [31:0] Rs1_data, 
    output wire [31:0] Rs2_data,
    output wire [31:0] Reg00,
    output wire [31:0] Reg01,
    output wire [31:0] Reg02,
    output wire [31:0] Reg03,
    output wire [31:0] Reg04,
    output wire [31:0] Reg05,
    output wire [31:0] Reg06,
    output wire [31:0] Reg07,
    output wire [31:0] Reg08,
    output wire [31:0] Reg09,
    output wire [31:0] Reg10,
    output wire [31:0] Reg11,
    output wire [31:0] Reg12,
    output wire [31:0] Reg13,
    output wire [31:0] Reg14,
    output wire [31:0] Reg15,
    output wire [31:0] Reg16,
    output wire [31:0] Reg17,
    output wire [31:0] Reg18,
    output wire [31:0] Reg19,
    output wire [31:0] Reg20,
    output wire [31:0] Reg21,
    output wire [31:0] Reg22,
    output wire [31:0] Reg23,
    output wire [31:0] Reg24,
    output wire [31:0] Reg25,
    output wire [31:0] Reg26,
    output wire [31:0] Reg27,
    output wire [31:0] Reg28,
    output wire [31:0] Reg29,
    output wire [31:0] Reg30,
    output wire [31:0] Reg31
);
     integer i;
     reg [31:0] Reg [31:0];
     initial begin
         for(i = 0; i < 32; i = i + 1) begin
            Reg[i] <= 32'b0;
        end 
     end


    assign Reg00 = Reg[0];
    assign Reg01 = Reg[1];
    assign Reg02 = Reg[2];
    assign Reg03 = Reg[3];
    assign Reg04 = Reg[4];
    assign Reg05 = Reg[5];
    assign Reg06 = Reg[6];
    assign Reg07 = Reg[7];
    assign Reg08 = Reg[8];
    assign Reg09 = Reg[9];
    assign Reg10 = Reg[10];
    assign Reg11 = Reg[11];
    assign Reg12 = Reg[12];
    assign Reg13 = Reg[13];
    assign Reg14 = Reg[14];
    assign Reg15 = Reg[15];
    assign Reg16 = Reg[16];
    assign Reg17 = Reg[17];
    assign Reg18 = Reg[18];
    assign Reg19 = Reg[19];
    assign Reg20 = Reg[20];
    assign Reg21 = Reg[21];
    assign Reg22 = Reg[22];
    assign Reg23 = Reg[23];
    assign Reg24 = Reg[24];
    assign Reg25 = Reg[25];
    assign Reg26 = Reg[26];
    assign Reg27 = Reg[27];
    assign Reg28 = Reg[28];
    assign Reg29 = Reg[29];
    assign Reg30 = Reg[30];
    assign Reg31 = Reg[31];
   always @(posedge clk or posedge rst) begin
        if (rst) begin

            for (i = 0; i < 32; i = i + 1) begin
                Reg[i] <= 32'b0;
            end
        end
        else begin
            if (RegWrite && (Wt_addr != 5'b0)) begin
                Reg[Wt_addr] <= Wt_data;
            end
            else begin
                Reg[Wt_addr] <= Reg[Wt_addr];
            end
        end
    end
    assign Rs1_data = Reg[Rs1_addr];
    assign Rs2_data = Reg[Rs2_addr];

endmodule
PC
module Reg(
    input wire clk,
    input wire rst,
    input wire CE,
    input [31:0] D,
    output [31:0] Q
);
    reg [31:0] Reg;
    initial begin
        Reg = 32'd0; 
    end
    always @(posedge clk or posedge rst) begin
        if(rst) begin
            Reg <= 32'd0;  
        end
        else begin
            if(CE) Reg <= D;
            else Reg <= Reg;
        end
    end    
    assign Q = Reg;
endmodule

仿真关键步骤说明

为了仿真,我们需要建立一个仿真平台 testbench, 在里面实例化 SCPU, ROM 和 RAM 来运行仿真代码

1. testbench 代码:

module testbench(
    input wire clk,
    input wire rst
);

    /* SCPU output */
    wire [31:0] Addr_out;
    wire [31:0] Data_out;       
    wire        CPU_MIO;
    wire        MemRW;
    wire [31:0] PC_out;
    /* RAM output */
    wire [31:0] douta;
    /* ROM output */
    wire [31:0] spo;
SCPU u0(
        .clk(clk),
        .rst(rst),
        .Data_in(douta),
        .MIO_ready(CPU_MIO),
        .inst_in(spo),
        .Addr_out(Addr_out),
        .Data_out(Data_out),
        .CPU_MIO(CPU_MIO),
        .MemRW(MemRW),
        .PC_out(PC_out)
    );

    RAM_B u1(
        .clka(~clk),
        .wea(MemRW),
        .addra(Addr_out[11:2]),
        .dina(Data_out),
        .douta(douta)
    );

    ROM_D u2(
        .a(PC_out[11:2]),
        .spo(spo)
    );
endmodule

在 testbench 中, 实例化了 SCPU, ROM 和 RAM 并进行了接线

仿真代码:

ImmGen 仿真

`timescale 1ns/1ps
`define IMM_SEL_WIDTH 2
`define IMM_SEL_I   `IMM_SEL_WIDTH'd0
`define IMM_SEL_S   `IMM_SEL_WIDTH'd1
`define IMM_SEL_B   `IMM_SEL_WIDTH'd2
`define IMM_SEL_J   `IMM_SEL_WIDTH'd3
module ImmGen_tb();
 reg [1:0]   ImmSel;
 reg [31:0]  inst_field;
 wire[31:0]  Imm_out;

 ImmGen m0 (.ImmSel(ImmSel), .inst_field(inst_field), .Imm_out(Imm_out));

`define LET_INST_BE(inst) \
 inst_field = inst; \
 #5;

 initial begin
   $dumpfile("ImmGen.vcd");
   $dumpvars(1, ImmGen_tb);

   #5;
   /* Test for I-Type */
   ImmSel = `IMM_SEL_I;
   `LET_INST_BE(32'h3E810093);   //addi x1, x2, 1000
   `LET_INST_BE(32'h00A14093);   //xori x1, x2, 10
   `LET_INST_BE(32'h00116093);   //ori x1, x2, 1
   `LET_INST_BE(32'h00017093);   //andi x1, x2, 0
   `LET_INST_BE(32'h01411093);   //slli x1, x2, 20
   `LET_INST_BE(32'h00515093);   //srli x1, x2, 5
   `LET_INST_BE(32'h41815093);   //srai x1, x2, 24
   `LET_INST_BE(32'hFFF12093);   //slti x1, x2, -1
   `LET_INST_BE(32'h3FF13093);   //sltiu x1, x2, 1023
   `LET_INST_BE(32'h0E910083);   //lb x1, 233(x2)

   #20;
   /* Test for S-Type */
   ImmSel = `IMM_SEL_S;
   `LET_INST_BE(32'hFE110DA3);   //sb x1, -5(x2)
   `LET_INST_BE(32'h00211023);   //sh x2, 0(x2)
   `LET_INST_BE(32'h00C0A523);   //sw x12, 10(x1)

   #20;
   /* Test for B-Type */
   ImmSel = `IMM_SEL_B;
   `LET_INST_BE(32'hFE108AE3);   //beq x1, x1, -12
   `LET_INST_BE(32'h00211463);   //bne x2, x2, 8
   `LET_INST_BE(32'h0031CA63);   //blt x3, x3, 20
   `LET_INST_BE(32'hFE4256E3);   //bge x4, x4, -20

   #20;
   /* Test for J-Type */
   ImmSel = `IMM_SEL_J;
   `LET_INST_BE(32'hF9DFF06F);   //jal x0, -100
   `LET_INST_BE(32'h3FE000EF);   //jal x1, 1023 NOTE: does ImmGen output 1023?
   #50; $finish();
 end
endmodule
SCPU_Ctrl 仿真
`timescale 1ns/1ps

`include "Lab4_header.vh"

module SCPU_ctrl_tb();
 reg [4:0]     OPcode;
 reg [2:0]     Fun3;
 reg           Fun7;
 reg           MIO_ready;
 wire [1:0]    ImmSel;
 wire          ALUSrc_B;
 wire [1: 0]   MemtoReg;
 wire          Jump;
 wire          Branch;
 wire          RegWrite;
 wire          MemRW;
 wire [3:0]    ALU_Control;
 wire          CPU_MIO;

 SCPU_ctrls m0 (
   .OPcode(OPcode),
   .Fun3(Fun3),
   .Fun7(Fun7),
   .MIO_ready(MIO_ready),
   .ImmSel(ImmSel),
   .ALUSrc_B(ALUSrc_B),
   .MemtoReg(MemtoReg),
   .Jump(Jump),
   .Branch(Branch),
   .RegWrite(RegWrite),
   .MemRW(MemRW),
   .ALU_Control(ALU_Control),
   .CPU_MIO(CPU_MIO)
 );

 reg [31:0] inst_for_test;

`define LET_INST_BE(inst) \
 inst_for_test = inst; \
 OPcode = inst_for_test[6:2]; \
 Fun3 = inst_for_test[14:12]; \
 Fun7 = inst_for_test[30]; \
 #50;

 initial begin
   $dumpfile("SCPU_ctrl.vcd");
   $dumpvars(1, SCPU_ctrl_tb);

   #5;
   MIO_ready = 0;
   #5;
   `LET_INST_BE(32'h001100B3);   //add x1, x2, x1
   `LET_INST_BE(32'h400080B3);   //sub x1, x1, x0
   `LET_INST_BE(32'h002140B3);   //xor x1, x2, x2
   `LET_INST_BE(32'h002160B3);   //or x1, x2, x2
   `LET_INST_BE(32'h002170B3);   //and x1, x2, x2
   `LET_INST_BE(32'h002150B3);   //srl x1, x2, x2
   `LET_INST_BE(32'h002120B3);   //slt x1, x2, x2
   `LET_INST_BE(32'h3E810093);   //addi x1, x2, 1000
   `LET_INST_BE(32'h00A14093);   //xori x1, x2, 10
   `LET_INST_BE(32'h00116093);   //ori x1, x2, 1
   `LET_INST_BE(32'h00017093);   //andi x1, x2, 0
   `LET_INST_BE(32'h00515093);   //srli x1, x2, 5
   `LET_INST_BE(32'hFFF12093);   //slti x1, x2, -1
   `LET_INST_BE(32'h00812083);   //lw x1, 8(x2)
   `LET_INST_BE(32'h00C0A823);   //sw x12, 16(x1)
   `LET_INST_BE(32'hFE108AE3);   //beq x1, x1, -12
   `LET_INST_BE(32'hF9DFF06F);   //jal x0, -100
   `LET_INST_BE(32'h3FE000EF);   //jal x1, 1023

   #50; $finish();
 end
endmodule

SCPU 仿真

module testbench_tb();

    reg clk;
    reg rst;
    testbench m0(.clk(clk), .rst(rst));

    initial begin
        clk = 1'b0;
        rst = 1'b1;
        #50;
        rst = 1'b0;
    end
    always #10 clk = ~clk;

endmodule

SCPU 仿真测试/下板代码

    j    start            # 00
dummy:
    nop                   # 04
    nop                   # 08
    nop                   # 0C
    nop                   # 10
    nop                   # 14
    nop                   # 18
    nop                   # 1C
    j    dummy

start:
    beq  x0, x0, pass_0
    li   x31, 0
    j    dummy
pass_0:
    li   x1, -1           # x1=FFFFFFFF
    xori x3, x1, 1        # x3=FFFFFFFE
    add  x3, x3, x3       # x3=FFFFFFFC
    add  x3, x3, x3       # x3=FFFFFFF8
    add  x3, x3, x3       # x3=FFFFFFF0
    add  x3, x3, x3       # x3=FFFFFFE0
    add  x3, x3, x3       # x3=FFFFFFC0
    add  x3, x3, x3       # x3=FFFFFF80
    add  x3, x3, x3       # x3=FFFFFF00
    add  x3, x3, x3       # x3=FFFFFE00
    add  x3, x3, x3       # x3=FFFFFC00
    add  x3, x3, x3       # x3=FFFFF800
    add  x3, x3, x3       # x3=FFFFF000
    add  x3, x3, x3       # x3=FFFFE000
    add  x3, x3, x3       # x3=FFFFC000
    add  x3, x3, x3       # x3=FFFF8000
    add  x3, x3, x3       # x3=FFFF0000
    add  x3, x3, x3       # x3=FFFE0000
    add  x3, x3, x3       # x3=FFFC0000
    add  x3, x3, x3       # x3=FFF80000
    add  x3, x3, x3       # x3=FFF00000
    add  x3, x3, x3       # x3=FFE00000
    add  x3, x3, x3       # x3=FFC00000
    add  x3, x3, x3       # x3=FF800000
    add  x3, x3, x3       # x3=FF000000
    add  x3, x3, x3       # x3=FE000000
    add  x3, x3, x3       # x3=FC000000
    add  x5, x3, x3       # x5=F8000000
    add  x3, x5, x5       # x3=F0000000
    add  x4, x3, x3       # x4=E0000000
    add  x6, x4, x4       # x6=C0000000
    add  x7, x6, x6       # x7=80000000
    ori  x8, zero, 1      # x8=00000001
    ori  x28, zero, 31
    srl  x29, x7, x28     # x29=00000001
    beq  x8, x29, pass_1
    li   x31, 1
    j    dummy

pass_1:
    nop
    sub  x3, x6, x7       # x3=40000000
    sub  x4, x7, x3       # x4=40000000
    slti x9, x0, 1        # x9=00000001
    slt  x10, x3, x4
    slt  x10, x4, x3      # x10=00000000
    beq  x9, x10, dummy   # branch when x3 != x4
    srli x29, x3, 30      # x29=00000001
    beq  x29, x9, pass_2
    li   x31, 2
    j    dummy

pass_2:
    nop
# Test signed set-less-than
    slti x10, x1, 3       # x10=00000001
    slt  x11, x5, x1      # signed(0xF8000000) < -1
                        # x11=00000001
    slt  x12, x1, x3      # x12=00000001
    andi x10, x10, 0xff
    and  x10, x10, x11
    and  x10, x10, x12    # x10=00000001
    li   x11, 1
    beq  x10, x11, pass_3
    li   x31, 3
    j    dummy

pass_3:
    nop
    or   x11, x7, x3      # x11=C0000000
    beq  x11, x6, pass_4
    li   x31, 4
    j    dummy

pass_4:
    nop
    li   x18, 0x20        # base addr=0x20
### uncomment instr. below when simulating on venus
    # srli x18, x7, 3     # base addr=10000000
    sw   x5, 0(x18)       # mem[0x20]=F8000000
    sw   x4, 4(x18)       # mem[0x24]=40000000
    lw   x29, 0(x18)      # x29=mem[0x20]=F8000000
    xor  x29, x29, x5     # x29=00000000
    sw   x6, 0(x18)       # mem[0x20]=C0000000
    lw   x30, 0(x18)      # x30=mem[0x20]=C0000000
    xor  x29, x29, x30    # x29=C0000000
    beq  x6, x29, pass_5
    li   x31, 5
    j    dummy

pass_5:
    li   x31, 0x666
    j    dummy

实验结果与分析

仿真结果

1. ImmGen 仿真:

ImmGen 仿真输入 4 种类型的指令,分别输出了不同的立即数

图中的仿真波形与标准波形一致,结果符合要求

2. SCPU_Ctrl 仿真:

SCPU_Ctrl 仿真输入 6 种类型的指令,分别输出了不同的控制信号

例如,对于第一条指令 001100B3, 应为 add x1, x2, x1 指令

仿真波形中 RegWrite=1 , 代表控制 RegFile 写使能

同时 ALUSrc_B=0 , 表示选择 RegFile 读取的 rs2 作为 ALU 的输入 B

同时 ALU_Control=0 , 表示进行 add 操作

再如,最后一条指令 3FE000EF 对应 jal x1, 1023 指令

此时 RegWrite=1 , 代表控制 RegFile 写使能, 将地址存到 x1 中

同时 Jump=1 , 表示是 J 型指令

经检验,对所有指令输出的控制信号正确,结果符合预期

3. SCPU 仿真:

alt text

alt text

由于仿真波形过长,只将开头和结尾的波形显示出来

可以看到结尾的仿真波形中,Reg31 的值变成 666, 说明通过了前面的测试,结果符合预期

Lab 4-3

操作方法与实验步骤

代码设计层次结构图及说明

经过指令拓展后的完整 DataPath 图如下:

alt text

图中与之前的 DataPath 主要进行了如下更改:

  1. 增加了 Branch 信号位数,增加了多路选择器来支持不同 B 型指令
  2. 增加了 Jump 信号位数,来支持 jalr 指令
  3. 增加了 RAM 写使能的位数,以分别控制 4 个字节的写使能
  4. 增加了 luiauipc 的路径,即增加了 ImmGen 模块的生成类型,并增加了 reg_wt_data 的选择类型

源代码

1. ImmGen 代码:

module ImmGen(
  input  [2:0]   ImmSel,     
  input  [31:0]  inst_field, 
  output reg [31:0] Imm_out
);

  always @(*) begin
    case (ImmSel)
      3'b000: // I-type 
        Imm_out = {{20{inst_field[31]}}, inst_field[31:20]}; 

      3'b001: // S-type
        Imm_out = {{20{inst_field[31]}}, inst_field[31:25], inst_field[11:7]}; //

      3'b010: // B-type 
        Imm_out = {{19{inst_field[31]}}, inst_field[31], inst_field[7], inst_field[30:25], inst_field[11:8], 1'b0}; 

      3'b011: // J-type 
        Imm_out = {{11{inst_field[31]}}, inst_field[31], inst_field[19:12], inst_field[20], inst_field[30:21], 1'b0}; // 

      3'b100: // U_type
        Imm_out = {inst_field[31:12], 12'b0}; // high 20 bit Imm, low 12 bit 0

      default: Imm_out = 32'b0; //
    endcase
  end
endmodule
主要增加了 U 型指令

2. SCPU_Ctrl代码:

module SCPU_ctrls(
  input [4:0]       OPcode, 
  input [2:0]       Fun3,
  input             Fun7,
  input             MIO_ready,
  output reg [2:0]  ImmSel,
  output reg        ALUSrc_B,
  output reg [2:0]  MemtoReg,
  output reg [1:0]  Jump,
  output reg [2:0]  Branch,
  output reg        RegWrite,
  output reg        MemRW,
  output reg [3:0]  WHBU,
  output reg [3:0]  ALU_Control,
  output reg        CPU_MIO
);
initial begin
  ImmSel    = 3'b000;
  ALUSrc_B  = 1'b0;
  MemtoReg  = 3'b000; // 0: ALU result 1: Load from RAM to reg  2/3: PC4, JAL
  MemRW     = 1'b0; // write to/read from RAM
  WHBU      = 4'b0;
  Jump      = 2'b00;
  Branch    = 3'b000;
  RegWrite  = 1'b0;

  ALU_Control = 4'b0000;
  //CPU_MIO   = 1'b0;
end
always @(*) begin

  case (OPcode)
    5'b01100: begin // R-type 
      WHBU      = 4'b0;
      ALUSrc_B = 1'b0;
      MemtoReg = 3'b000;
      RegWrite = 1'b1;
      //ImmSel   = 2'b00; // no imm
      //MemRW    = 1'bx; // no read or write 
      Jump     = 2'b00;
      Branch   = 3'b000;

      case ({Fun3, Fun7})
        4'b0000: ALU_Control = 4'b0000; // ADD
        4'b0001: ALU_Control = 4'b0001; // SUB
        4'b0010: ALU_Control = 4'b0010; // SLL
        4'b0100: ALU_Control = 4'b0011; // SLT
        4'b0110: ALU_Control = 4'b0100; // SLTU
        4'b1000: ALU_Control = 4'b0101; // XOR
        4'b1010: ALU_Control = 4'b0110; // SRL
        4'b1011: ALU_Control = 4'b0111; // SRA
        4'b1100: ALU_Control = 4'b1000; // OR
        4'b1110: ALU_Control = 4'b1001; // AND
      endcase
    end
    5'b00000: begin // Load 

      ALUSrc_B  = 1'b1;
      MemtoReg  = 3'b001;
      RegWrite  = 1'b1;
      ImmSel    = 3'b000;
      MemRW     = 1'b0; // read
      case (Fun3)
        3'b000: WHBU      = 4'b0010; // LB
        3'b001: WHBU      = 4'b0100; // LH
        3'b010: WHBU      = 4'b1000; // LW
        3'b100: WHBU      = 4'b0011; // LBU
        3'b101: WHBU      = 4'b0101; // LHU
      endcase
      Jump      = 2'b00;
      Branch    = 3'b000;
      ALU_Control = 4'b0000; // ADD for address calculation
    end
    5'b01000: begin // Store 
      ALUSrc_B  = 1'b1;
      //MemtoReg  = 2'b01;
      MemtoReg = 3'bx;
      //RegWrite  = 1'b1;
      RegWrite  = 1'b0;
      ImmSel    = 3'b001;
      MemRW     = 1'b1; // write
      case (Fun3)
        3'b000: WHBU      = 4'b0010; // SB
        3'b001: WHBU      = 4'b0100; // SH
        3'b010: WHBU      = 4'b1000; // SW
      endcase

      Jump      = 2'b00;
      Branch    = 3'b000; 
      ALU_Control = 4'b0000; // ADD for address calculation
    end
    5'b11000: begin // Branch
      WHBU      = 4'b0000;
      ALUSrc_B  = 1'b0;
      MemtoReg  = 3'bx; // new
      RegWrite  = 1'b0; //
      ImmSel    = 3'b010; 
      //MemRW     = 1'bx;

      //Branch    = 1'b1;
      Jump      = 2'b00;
      case (Fun3)
        3'b000: begin Branch = 3'b001;  ALU_Control = 4'd1; end // BEQ, do SUB in ALU 
        3'b001: begin Branch = 3'b010;  ALU_Control = 4'd1; end // BNE
        3'b100: begin Branch = 3'b011;  ALU_Control = 4'd3; end // BLT, do SLT in ALU
        3'b101: begin Branch = 3'b100;  ALU_Control = 4'd3; end // BGE
        3'b110: begin Branch = 3'b101;  ALU_Control = 4'd4; end // BLTU, do SLTU in ALU
        3'b111: begin Branch = 3'b110;  ALU_Control = 4'd4; end // BGEU 
        //default: begin Branch = 3'b000; ALU_Control = 4'd0; end
      endcase
    end
    5'b11011: begin // JAL 
      WHBU      = 4'b0000;
      //ALUSrc_B = 1'b1;
      ALUSrc_B = 1'bx;
      MemtoReg = 3'b010; // PC + 4
      RegWrite = 1'b1; //
      ImmSel   = 3'b011;
      // MemRW = 1'bx;

      Jump     = 2'b01;
      Branch   = 3'b000;
    end
    5'b00100: begin // I-type ALU
      WHBU      = 4'b0000; 
      ALUSrc_B  = 1'b1;
      MemtoReg  = 3'b000;
      RegWrite  = 1'b1;
      ImmSel    = 3'b000;
      //MemRW     = 1'bx; 

      Jump      = 2'b00;
      Branch    = 3'b000;
      case (Fun3)
        3'b000: ALU_Control = 4'b0000; // ADDI
        3'b001: ALU_Control = 4'b0010; // SLLI
        3'b010: ALU_Control = 4'b0011; // SLTI
        3'b011: ALU_Control = 4'b0100; // SLTIU
        3'b100: ALU_Control = 4'b0101; // XORI
        3'b101: ALU_Control = (Fun7) ? 4'b0111 : 4'b0110; // SRAI / SRLI
        3'b110: ALU_Control = 4'b1000; // ORI
        3'b111: ALU_Control = 4'b1001; // ANDI
      endcase
    end
    5'b11001: begin // I-type JALR
      WHBU      = 4'b0000;
      ALUSrc_B = 1'b1;
      MemtoReg = 2'b010; // PC + 4
      RegWrite = 1'b1; //
      ImmSel   = 3'b000;
      //MemRW    = 1'b0;

      MemRW    = 1'bx;
      Jump     = 2'b10;
      Branch   = 3'b000;
      ALU_Control = 4'd0; // ADD
    end
    5'b01101: begin // lui
      WHBU      = 4'b0000;
      ALUSrc_B = 1'bx;
      MemtoReg = 3'b11; // lui_res = Imm
      RegWrite = 1'b1; //
      ImmSel   = 3'b100; // U-type
      //MemRW    = 1'b0;

      MemRW    = 1'bx;
      Jump     = 2'b00;
      Branch   = 3'b000;
      ALU_Control = 4'dx; // ADD
    end
    5'b00101: begin // auipc
      WHBU      = 4'b0000;
      ALUSrc_B = 1'bx;
      MemtoReg = 3'b100; // auipc_res = PC + Imm
      RegWrite = 1'b1; //
      ImmSel   = 3'b100;
      //MemRW    = 1'b0;

      MemRW    = 1'bx;
      Jump     = 2'b00;
      Branch   = 3'b000;
      ALU_Control = 4'dx; // ADD
    end
  endcase

  //CPU_MIO = MIO_ready;
end

endmodule
主要增加了 WHBU 这个信号,用来表示 Word, Half, Byte, Unsigned 这四种 Load,Store 的状态

此外还增加了 Branch, Jump 位数,增加了 U 型指令

3. DataPath代码:

module DataPaths(
    input [2:0] ImmSel, input ALUSrc_B, input [2:0] MemtoReg, input [1:0] Jump, input [2:0] Branch, input RegWrite, input [3:0] ALU_Control,
    input [31:0] Data_in, input clk, input [31:0] inst_field, input rst, input [3:0] WHBU, 
    output [31:0] Reg00, output [31:0] Reg01, output [31:0] Reg02, output [31:0] Reg03,
    output [31:0] Reg04, output [31:0] Reg05, output [31:0] Reg06, output [31:0] Reg07,
    output [31:0] Reg08, output [31:0] Reg09, output [31:0] Reg10, output [31:0] Reg11,
    output [31:0] Reg12, output [31:0] Reg13, output [31:0] Reg14, output [31:0] Reg15,
    output [31:0] Reg16, output [31:0] Reg17, output [31:0] Reg18, output [31:0] Reg19,
    output [31:0] Reg20, output [31:0] Reg21, output [31:0] Reg22, output [31:0] Reg23,
    output [31:0] Reg24, output [31:0] Reg25, output [31:0] Reg26, output [31:0] Reg27,
    output [31:0] Reg28, output [31:0] Reg29, output [31:0] Reg30, output [31:0] Reg31,
    output [31:0] ALU_out, output reg [31:0] Data_out, output [31:0] PC_out,
    output reg [3:0] wea);

reg branch;
wire Branch_one;
wire [31:0] branch_out;
wire ALU_zero;
wire ALU_overflow;
wire [31:0] PCAddImm;
wire [31:0] ALU_A;
wire [31:0] ALU_B;
reg [31:0] reg_wt_data;
wire [31:0] PCAdd4;
reg [31:0] PC_in;
wire [31:0] ImmOut;
wire [1:0] LSwea;

reg [31:0] LoadData;
wire [31:0] ReadData;
assign LSwea = ImmOut[1:0];
assign PCAdd4 = PC_out + 32'd4;
assign Branch_one = (Branch == 3'b000) ? 1'b0 : 1'b1;
assign PCAddImm = ImmOut + PC_out; // ALU ? Adder!

always @ (*) begin
    case (MemtoReg)
        3'd0: reg_wt_data = ALU_out; // ALU
        3'd1: begin
            case (WHBU)
                4'b0010: reg_wt_data = {{24{LoadData[7]}}, LoadData[7:0]}; // LB
                4'b0100: reg_wt_data = {{16{LoadData[15]}}, LoadData[15:0]}; // LH
                4'b1000: reg_wt_data = LoadData; // LW
                4'b0011: reg_wt_data = {24'b0, LoadData[7:0]}; // LBU
                4'b0101: reg_wt_data = {16'b0, LoadData[15:0]}; // LHU
                default: reg_wt_data = 0;
            endcase 
        end
        3'd2: reg_wt_data = PCAdd4; // JAL
        3'd3: reg_wt_data = ImmOut; // lui
        3'd4: reg_wt_data = PCAddImm; // auipc
        default: reg_wt_data = 32'b0;
    endcase      
end

always @ (*) begin  
    case (Branch)
        3'b001: branch = Branch_one & (ALU_zero); // BEQ
        3'b010: branch = Branch_one & (~ALU_zero); // BNE
        3'b011: branch = Branch_one & (ALU_out[0]); // BLT
        3'b100: branch = Branch_one & (~ALU_out[0]); // BGE
        3'b101: branch = Branch_one & (ALU_out[0]); // BLTU
        3'b110: branch = Branch_one & (~ALU_out[0]); // BGEU 
        default: branch = 1'b0;
    endcase
    case (Jump)
        2'b00: PC_in = branch_out;
        2'b01: PC_in = PCAddImm;
        2'b10: PC_in = ALU_out;
        default: PC_in = 32'b0;
    endcase
end
always @ (*) begin
    case (LSwea)
        2'b00: begin
            LoadData = Data_in;
            Data_out = ReadData;
            case (WHBU)
                4'b0010: wea = 4'b0001;
                4'b0100: wea = 4'b0011;
                4'b1000: wea = 4'b1111;
                default: wea = 0;
            endcase
        end
        2'b01: begin
            LoadData = {8'b0, Data_in[31:8]};
            Data_out = {ReadData[23:0], 8'b0};
            case (WHBU)
                4'b0010: wea = 4'b0010;
                4'b0100: wea = 4'b0110;
                default: wea = 0;
            endcase
        end
        2'b10: begin
            LoadData = {16'b0, Data_in[31:16]};
            Data_out = {ReadData[15:0], 16'b0};
            case (WHBU)
                4'b0010: wea = 4'b0100;
                4'b0100: wea = 4'b1100;
                default: wea = 0;
            endcase
        end
        2'b11: begin
            LoadData = {24'b0, Data_in[31:24]};
            Data_out = {ReadData[7:0], 24'b0};
            case (WHBU)
                4'b0010: wea = 4'b1000;
                default: wea = 0;
            endcase
        end

    endcase

end
assign ALU_B = ALUSrc_B ? ImmOut : ReadData;
assign branch_out = branch ? PCAddImm : PCAdd4;

ALU alu(.A(ALU_A), .B(ALU_B), .ALU_operation(ALU_Control), .res(ALU_out), 
    .zero(ALU_zero), .overflow(ALU_overflow));
wire [4:0] RS1, RS2, WT;
assign RS1 = inst_field[19:15];
assign RS2 = inst_field[24:20];
assign WT = inst_field[11:7];
Regs regs(.clk(clk), .rst(rst), .Rs1_addr(RS1), .Rs2_addr(RS2), 
    .Wt_addr(WT), .Wt_data(reg_wt_data), .RegWrite(RegWrite), 
    .Rs1_data(ALU_A), .Rs2_data(ReadData),
    .Reg00(Reg00),.Reg01(Reg01),.Reg02(Reg02),.Reg03(Reg03),
    .Reg04(Reg04),.Reg05(Reg05),.Reg06(Reg06),.Reg07(Reg07),
    .Reg08(Reg08),.Reg09(Reg09),.Reg10(Reg10),.Reg11(Reg11),
    .Reg12(Reg12),.Reg13(Reg13),.Reg14(Reg14),.Reg15(Reg15),
    .Reg16(Reg16),.Reg17(Reg17),.Reg18(Reg18),.Reg19(Reg19),
    .Reg20(Reg20),.Reg21(Reg21),.Reg22(Reg22),.Reg23(Reg23),
    .Reg24(Reg24),.Reg25(Reg25),.Reg26(Reg26),.Reg27(Reg27),
    .Reg28(Reg28),.Reg29(Reg29),.Reg30(Reg30),.Reg31(Reg31));

ImmGen immgen(.ImmSel(ImmSel), .inst_field(inst_field), .Imm_out(ImmOut));

Reg PC(.clk(clk), .rst(rst), .CE(1'b1), .D(PC_in), .Q(PC_out));

endmodule
主要增加了对于 B,J 型指令的选择,增加了多路选择器

同时对于新增的 WHBU 信号更改 RAM 读出,读入的信号:

对于读出到 RAM 的值,用 ReadData 存储 Reg File 读出的值,并根据偏移量使用 Data_out 存储移位后的值,最后传出这个值,写入到 RAM 中

对于读入到寄存器的值,用 Data_in 存储 RAM 读入的值,并进行位移得到 LoadData, 最后写入 RegFile 中

同时新增一个 4 位使能信号 wea, 分别控制 RAM 的 4 个字节的读写

仿真关键步骤说明

仿真代码同 Lab4-2

仿真测试代码:

    auipc x1, 0
    j     start            # 00
dummy:
    nop                    # 04
    nop                    # 08
    nop                    # 0C
    nop                    # 10
    nop                    # 14
    nop                    # 18
    nop                    # 1C
    j     dummy

start:
    bnez  x1, dummy
    beq   x0, x0, pass_0
    li    x31, 0
    auipc x30, 0
    j     dummy
pass_0:
    li    x31, 1
    bne   x0, x0, dummy
    bltu  x0, x0, dummy
    li    x1, -1           # x1=FFFFFFFF
    xori  x3, x1, 1        # x3=FFFFFFFE
    add   x3, x3, x3       # x3=FFFFFFFC
    add   x3, x3, x3       # x3=FFFFFFF8
    add   x3, x3, x3       # x3=FFFFFFF0
    add   x3, x3, x3       # x3=FFFFFFE0
    add   x3, x3, x3       # x3=FFFFFFC0
    add   x3, x3, x3       # x3=FFFFFF80
    add   x3, x3, x3       # x3=FFFFFF00
    add   x3, x3, x3       # x3=FFFFFE00
    add   x3, x3, x3       # x3=FFFFFC00
    add   x3, x3, x3       # x3=FFFFF800
    add   x3, x3, x3       # x3=FFFFF000
    add   x3, x3, x3       # x3=FFFFE000
    add   x3, x3, x3       # x3=FFFFC000
    add   x3, x3, x3       # x3=FFFF8000
    add   x3, x3, x3       # x3=FFFF0000
    add   x3, x3, x3       # x3=FFFE0000
    add   x3, x3, x3       # x3=FFFC0000
    add   x3, x3, x3       # x3=FFF80000
    add   x3, x3, x3       # x3=FFF00000
    add   x3, x3, x3       # x3=FFE00000
    add   x3, x3, x3       # x3=FFC00000
    add   x3, x3, x3       # x3=FF800000
    add   x3, x3, x3       # x3=FF000000
    add   x3, x3, x3       # x3=FE000000
    add   x3, x3, x3       # x3=FC000000
    add   x5, x3, x3       # x5=F8000000
    add   x3, x5, x5       # x3=F0000000
    add   x4, x3, x3       # x4=E0000000
    add   x6, x4, x4       # x6=C0000000
    add   x7, x6, x6       # x7=80000000
    ori   x8, zero, 1      # x8=00000001
    ori   x28, zero, 31
    srl   x29, x7, x28     # x29=00000001
    auipc x30, 0
    bne   x8, x29, dummy
    auipc x30, 0
    blt   x8, x7, dummy
    sra   x29, x7, x28     # x29=FFFFFFFF
    and   x29, x29, x3     # x29=x3=F0000000
    auipc x30, 0
    bne   x3, x29, dummy
    mv    x29, x8          # x29=x8=00000001
    bltu  x29, x7, pass_1  # unsigned 00000001 < 80000000
    auipc x30, 0
    j     dummy

pass_1:
    nop
    li    x31, 2
    sub   x3, x6, x7       # x3=40000000
    sub   x4, x7, x3       # x4=40000000
    slti  x9, x0, 1        # x9=00000001
    slt   x10, x3, x4
    slt   x10, x4, x3      # x10=00000000
    auipc x30, 0
    beq   x9, x10, dummy   # branch when x3 != x4
    srli  x29, x3, 30      # x29=00000001
    beq   x29, x9, pass_2
    auipc x30, 0
    j     dummy

pass_2:
    nop
# Test set-less-than
    li    x31, 3
    slti  x10, x1, 3       # x10=00000001
    slt   x11, x5, x1      # signed(0xF8000000) < -1
                        # x11=00000001
    slt   x12, x1, x3      # x12=00000001
    andi  x10, x10, 0xff
    and   x10, x10, x11
    and   x10, x10, x12    # x10=00000001
    auipc x30, 0
    beqz  x10, dummy
    sltu  x10, x1, x8      # unsigned FFFFFFFF < 00000001 ?
    auipc x30, 0
    bnez  x10, dummy
    sltu  x10, x8, x3      # unsigned 00000001 < F0000000 ?
    auipc x30, 0
    beqz  x10, dummy
    sltiu x10, x1, 3
    auipc x30, 0
    bnez  x10, dummy
    li    x11, 1
    bne   x10, x11, pass_3
    auipc x30, 0
    j     dummy

pass_3:
    nop
    li    x31, 4
    or    x11, x7, x3      # x11=C0000000
    beq   x11, x6, pass_4
    auipc x30, 0
    j     dummy

pass_4:
    nop
    li    x31, 5
    li    x18, 0x20        # base addr=00000020
### uncomment instr. below when simulating on venus
    # lui   x18, 0x10000     # base addr=10000000
    sw    x5, 0(x18)       # mem[0x20]=F8000000
    sw    x4, 4(x18)       # mem[0x24]=40000000
    lw    x27, 0(x18)      # x27=mem[0x20]=F8000000
    xor   x27, x27, x5     # x27=00000000
    sw    x6, 0(x18)       # mem[0x20]=C0000000
    lw    x28, 0(x18)      # x28=mem[0x20]=C0000000
    xor   x27, x6, x28     # x27=00000000
    auipc x30, 0
    bnez  x27, dummy
    lui   x20, 0xA0000     # x20=A0000000
    sw    x20, 8(x18)      # mem[0x28]=A0000000
    lui   x27, 0xFEDCB     # x27=FEDCB000
    srai  x27, x27, 12     # x27=FFFFEDCB
    li    x28, 8
    sll   x27, x27, x28    # x27=FFEDCB00
    ori   x27, x27, 0xff   # x27=FFEDCBFF
    lb    x29, 11(x18)     # x29=FFFFFFA0, little-endian, signed-ext
    and   x27, x27, x29    # x27=FFEDCBA0
    sw    x27, 8(x18)      # mem[0x28]=FFEDCBA0
    lhu   x27, 8(x18)      # x27=0000CBA0
    lui   x20, 0xFFFF0     # x20=FFFF0000
    and   x20, x20, x27    # x20=00000000
    auipc x30, 0
    bnez  x20, dummy       # check unsigned-ext
    li    x31, 6
    lbu   x28, 10(x18)     # x28=000000ED
    lbu   x29, 11(x18)     # x29=000000FF
    slli  x29, x29, 8      # x29=0000FF00
    or    x29, x29, x28    # x29=0000FFED
    slli  x29, x29, 16
    or    x29, x27, x29    # x29=FFEDCBA0
    lw    x28, 8(x18)      # x28=FFEDCBA0
    auipc x30, 0
    bne   x28, x29, dummy
    sw    x0, 0(x18)       # mem[0x20]=00000000
    sh    x27, 0(x18)      # mem[0x20]=0000CBA0
    li    x28, 0xD0
    sb    x28, 2(x18)      # mem[0x20]=00D0CBA0
    lw    x28, 0(x18)      # x28=00D0CBA0
    li    x29, 0x00D0CBA0
    auipc x30, 0
    bne   x28, x29, dummy
    lh    x27, 2(x18)      # x27=000000D0
    li    x28, 0xD0
    auipc x30, 0
    bne   x27, x28, dummy

pass_5:
    li    x31, 7
    auipc x30, 0
    bge   x1, x0, dummy    # -1 >= 0 ?
    bge   x8, x1, pass_6   # 1 >= -1 ?
    auipc x30, 0
    j     dummy

pass_6:
    auipc x30, 0
    bgeu  x0, x1, dummy    # 0 >= FFFFFFFF ?
    auipc x30, 0
    bgeu  x8, x1, dummy
    auipc x20, 0
    jalr  x21, x0, pass_7  # just for test : (
    auipc x30, 0
    j     dummy

pass_7:
# jalr ->
    addi  x20, x20, 8
    auipc x30, 0
    bne   x20, x21, dummy
    li    x31, 0x666
    j     dummy

在 Lab4-2 的基础上添加了拓展出的指令,同样在通过测试后进入 Dummy 循环

实验结果与分析

仿真结果

alt text

alt text

由于仿真波形过长,只显示开头和结尾的波形

可以看到,Reg31 的值被改为 666, 说明通过了前面的测试,进入了 Dummy 循环,结果符合预期

Lab 4-4

操作方法与实验步骤

代码设计及说明

1. CSR寄存器及其指令

实现 5 个异常寄存器:

mstatus: Machine Status Register,存储当前控制状态。将第 3 位的 MIE 置为 1, 表示当前已经进入异常/中断处理

mtvec: Machine Trap-Vector Base-Address Register,存储中断向量表基地址

mcause: Machine Cause Register,存储引起这次 trap 的原因。 如果进入 trap 的原因是中断,则最高位 interrupt bit 设置为 1,若为异常则设为 0。

mtval: Machine Trap Value Register,存储异常的相关信息以帮助软件处理异常,曾称 mbadaddr。

mepc: Machine Exception Program Counter,存储 trap 触发时将要执行的指令地址,在 mret 时作为返回地址。

以及 6 中指令:csrrw,csrrs,csrrc,csrrwi,csrrsi,csrrci, 来直接更改这 5 个寄存器的值

为此,要实现一个 CSRRegs 模块来管理这些寄存器

2. 异常中断处理

需要实现两种指令:

ecall: 软件中断指令

mret: 异常中断返回指令

同时要新增外部中断信号,实现硬件中断

为此,要实现一个 RV_INT 模块来实现产生输入到 CSRRegs 模块中的旁路输入,同时产生 pc 的更改信号

还要更改 SCPU_Ctrl 来实现指令识别和中断信号的产生

3. trap 程序

当触发异常中断处理后,要进入 trap 程序

在程序中,需要读出 mepc, mscause, mtval, mstatus, mtvec 的值,放在某个寄存器当中。为了防止通用寄存器中的有效值丢失,选择在之前代码里未被使用的寄存器。

之后将 mepc 读出,处理 mepc

对于异常(非法指令),mepc <- mepc + 4

对于中断,如果是软件中断 ecallmepc <- mepc + 4

如果是硬件中断,mepc <- mepc。(使用 mcause 进行区分)

最后调用 mret 返回到原来的程序。(此时要恢复进入处理程序所保存的信息)

源代码

1. CSRRegs 代码:

module CSRRegs(
    input clk,
    input rst,
    input [11:0] raddr,                
    input [11:0] waddr,                
    input [31:0] wdata,                
    input csr_w,                       
    input [1:0] csr_wsc_mode,          
    input expt_int,               
    input [31:0] mepc_bypass_in,  
    input [31:0] mcause_bypass_in,
    input [31:0] mtval_bypass_in,
    input [31:0] mstatus_bypass_in,    
    output reg [31:0] rdata,     
    output reg [31:0] mstatus,
    output reg [31:0] mtvec,
    output reg [31:0] mepc,
    output reg [31:0] mcause,
    output reg [31:0] mtval
);
    //reg [31:0] csr [4095:0]; 
    localparam CSR_WSC_WRITE   = 2'b00; 
    localparam CSR_WSC_SET     = 2'b01; 
    localparam CSR_WSC_CLEAR   = 2'b10; 

    localparam CSR_MSTATUS     = 12'h300;
    localparam CSR_MTVEC       = 12'h305;
    localparam CSR_MEPC        = 12'h341;
    localparam CSR_MCAUSE      = 12'h342;
    localparam CSR_MTVAL       = 12'h343;

    always @(*) begin
        case (raddr)
            CSR_MSTATUS: rdata = mstatus;
            CSR_MTVEC: rdata = mtvec;
            CSR_MEPC: rdata = mepc;
            CSR_MCAUSE: rdata = mcause;
            CSR_MTVAL: rdata = mtval;
            default: rdata = 32'd0; 
        endcase
    end
    always @(posedge clk or posedge rst) begin
        if (rst) begin
            mstatus <= 32'd8;
            mtvec <= 32'h7C;
            mepc <= 32'b0;
            mcause <= 32'b0;
            mtval <= 32'b0;
        end else begin
            if (expt_int) begin

                mepc <= mepc_bypass_in;
                mcause <= mcause_bypass_in;
                mtval <= mtval_bypass_in;
                mstatus <= mstatus_bypass_in;
            end else if (csr_w) begin
                case (waddr)
                    CSR_MSTATUS: begin
                        case (csr_wsc_mode)
                            CSR_WSC_WRITE: mstatus <= wdata;
                            CSR_WSC_SET:   mstatus <= mstatus | wdata;
                            CSR_WSC_CLEAR: mstatus <= mstatus & ~wdata;
                            default: mstatus <= mstatus;
                        endcase
                    end
                    CSR_MTVEC: begin
                        case (csr_wsc_mode)
                            CSR_WSC_WRITE: mtvec <= wdata;
                            CSR_WSC_SET:   mtvec <= mtvec | wdata;
                            CSR_WSC_CLEAR: mtvec <= mtvec & ~wdata;
                            default: mtvec <= mtvec;
                        endcase
                    end
                    CSR_MEPC: begin
                        case (csr_wsc_mode)
                            CSR_WSC_WRITE: mepc <= wdata;
                            CSR_WSC_SET:   mepc <= mepc | wdata;
                            CSR_WSC_CLEAR: mepc <= mepc & ~wdata;
                            default: mepc <= mepc;
                        endcase
                    end
                    CSR_MCAUSE: begin
                        case (csr_wsc_mode)
                            CSR_WSC_WRITE: mcause <= wdata;
                            CSR_WSC_SET:   mcause <= mcause | wdata;
                            CSR_WSC_CLEAR: mcause <= mcause & ~wdata;
                            default: mcause <= mcause;
                        endcase
                    end
                    CSR_MTVAL: begin
                        case (csr_wsc_mode)
                            CSR_WSC_WRITE: mtval <= wdata;
                            CSR_WSC_SET:   mtval <= mtval | wdata;
                            CSR_WSC_CLEAR: mtval <= mtval & ~wdata;
                            default: mtval <= mtval;
                        endcase
                    end
                    default: mtval <= mtval;
                endcase

            end
        end
    end
endmodule

在这个模块中,主要实现了利用 6 种 CSR 指令对于寄存器进行修改;同时,当触发异常中断,使用旁路输入同时更改所有的寄存器

2. RV_INT 代码:

module RV_INT (
    input        clk,
    input        rst,
    input        INT,                
    input        ecall,              
    input        mret,               
    input        illegal_inst,       
    input [31:0] mstatus,
    input [31:0] mtvec,
    input [31:0] mepc,
    input [31:0] inst,
    input [31:0] pc_current,         
    output       en,                 
    output reg   expt_int, 
    output reg   pc_change,
    output reg [31:0] mepc_bypass_in, 
    output reg [31:0] mcause_bypass_in,  
    output reg [31:0] mtval_bypass_in,
    output reg [31:0] mstatus_bypass_in,
    output reg [31:0] pc             
);


    localparam MCAUSE_INT_EXTERNAL   = 32'h8000000B; 
    localparam MCAUSE_EXC_ECALL      = 32'h0000000B; // ECALL 
    localparam MCAUSE_EXC_ILLEGAL    = 32'h00000002; 
//    localparam MCAUSE_EXC_L_ACCESS   = 32'h00000005; 
//    localparam MCAUSE_EXC_J_ACCESS   = 32'h00000007; 
    localparam MSTATUS_ENABLE        = 32'h00000008;
    localparam MSTATUS_UNABLE        = 32'h00000000;
    // CSRRegs
    wire [31:0] csr_rdata;
    reg [31:0] csr_wdata;
    reg [11:0] csr_raddr, csr_waddr;
    reg csr_w;
    reg [1:0] csr_wsc_mode;
    always @(*) begin
        if(rst) begin
            expt_int = 1'b0;
            mepc_bypass_in = 32'b0;
            mcause_bypass_in = 32'b0;
            mtval_bypass_in = 32'b0;
            mstatus_bypass_in = MSTATUS_ENABLE;
            pc = pc_current;
            pc_change = 1'b0;
        end
        else begin
            if(mstatus == MSTATUS_ENABLE) begin // enabled
                if (INT) begin
                    expt_int = 1'b1;
                    pc = mtvec;
                    pc_change = 1'b1;
                    mcause_bypass_in = MCAUSE_INT_EXTERNAL;
                    mstatus_bypass_in = MSTATUS_UNABLE;
                    mepc_bypass_in = pc_current;
                    mtval_bypass_in = 32'b0;
                end else if (ecall) begin
                    expt_int = 1'b1;
                    pc = mtvec;
                    pc_change = 1'b1;
                    mcause_bypass_in = MCAUSE_EXC_ECALL;
                    mstatus_bypass_in = MSTATUS_UNABLE;
                    mepc_bypass_in = pc_current;
                    mtval_bypass_in = 32'b0;
                end else if (illegal_inst) begin
                    expt_int = 1'b1;
                    pc = mtvec;
                    pc_change = 1'b1;
                    mcause_bypass_in = MCAUSE_EXC_ILLEGAL;
                    mstatus_bypass_in = MSTATUS_UNABLE;
                    mepc_bypass_in = pc_current;
                    mtval_bypass_in = inst;
                end 
                else begin
                    expt_int = 1'b0;
                    pc_change = 1'b0;
                end
            end
            else begin
                if (mret) begin
                    expt_int = 1'b1;
                    pc = mepc;
                    pc_change = 1'b1;
                    mcause_bypass_in = 32'b0;
                    mstatus_bypass_in = MSTATUS_ENABLE; // clear mark
                    mepc_bypass_in = 32'b0;
                    mtval_bypass_in = 32'b0;
                end        
                else begin
                    expt_int = 1'b0;
                    pc_change = 1'b0;
                end
            end
        end
    end
    assign en = ~expt_int;
endmodule
在这个模块中,主要实现了产生旁路输出,并产生 pc 更改信号,传入 DataPath

其中 mcause 的值采用了标准的值,即:

localparam MCAUSE_INT_EXTERNAL   = 32'h8000000B; 
localparam MCAUSE_EXC_ECALL      = 32'h0000000B; // ECALL 
localparam MCAUSE_EXC_ILLEGAL    = 32'h00000002; 

最高位为 1 代表是外部中断

3. SCPU_Ctrl 代码:

module SCPU_ctrls(
  input [4:0]       OPcode, 
  input [2:0]       Fun3,
  input             Fun7,
  input             MIO_ready,
  input [6:0]       High7,
  output reg [2:0]  ImmSel,
  output reg        ALUSrc_B,
  output reg [2:0]  MemtoReg,
  output reg [1:0]  Jump,
  output reg [2:0]  Branch,
  output reg        RegWrite,
  output reg        MemRW,
  output reg [3:0]  WHBU,
  output reg [3:0]  ALU_Control,
  output reg        csr_w,     
  output reg        is_csri,
  output reg [1:0]  csr_wsc_mode,
  output reg        mret,
  output reg        ecall,
  output reg        illegal, 
  output reg        CPU_MIO
);
initial begin
  ImmSel    = 3'b000;
  ALUSrc_B  = 1'b0;
  MemtoReg  = 3'b000; // 0: ALU result 1: Load from RAM to reg  2/3: PC4, JAL
  MemRW     = 1'b0; // write to/read from RAM?
  WHBU      = 4'b0;
  Jump      = 2'b00;
  Branch    = 3'b000;
  RegWrite  = 1'b0;

  ALU_Control = 4'b0000;
  //CPU_MIO   = 1'b0;
end
always @(*) begin
  case (OPcode)
    5'b01100: begin // R-type 
      is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;  
      illegal  = 1'b0;
      WHBU     = 4'b0;
      ALUSrc_B = 1'b0;
      MemtoReg = 3'b000;
      RegWrite = 1'b1;
      //ImmSel   = 2'b00; // no imm
      //MemRW    = 1'bx; // no read or write 
      Jump     = 2'b00;
      Branch   = 3'b000;

      case ({Fun3, Fun7})
        4'b0000: ALU_Control = 4'b0000; // ADD
        4'b0001: ALU_Control = 4'b0001; // SUB
        4'b0010: ALU_Control = 4'b0010; // SLL
        4'b0100: ALU_Control = 4'b0011; // SLT
        4'b0110: ALU_Control = 4'b0100; // SLTU
        4'b1000: ALU_Control = 4'b0101; // XOR
        4'b1010: ALU_Control = 4'b0110; // SRL
        4'b1011: ALU_Control = 4'b0111; // SRA
        4'b1100: ALU_Control = 4'b1000; // OR
        4'b1110: ALU_Control = 4'b1001; // AND
        default: ALU_Control = 4'b0000;
      endcase
    end
    5'b11100: begin // I-type csr
      illegal  = 1'b0;
      WHBU     = 4'b0;
      ALUSrc_B = 1'b0;
      MemtoReg = 3'b101;
      RegWrite = 1'b1; // write into rd
      ImmSel   = 3'b101;
      Jump     = 2'b00;
      Branch   = 3'b000;
      csr_w    = 1'b1;
      case (Fun3)
        3'b000: begin
          //expt_int = 1'b1;
          if (High7 == 7'b0011000) begin // mret
              mret = 1'b1;
              ecall = 1'b0;
          end
          else begin // ecall
              mret = 1'b0;
              ecall = 1'b1;
          end
        end
        3'b001: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;end// csrrw
        3'b010: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b01;end// csrrs
        3'b011: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b10;end// csrrc
        3'b101: begin is_csri = 1'b1; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;end// csrrwi
        3'b110: begin is_csri = 1'b1; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b01;end// csrrsi
        3'b111: begin is_csri = 1'b1; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b10;end// csrrci
        default: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;end
      endcase
    end
    5'b00000: begin // Load 
      is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
      illegal  = 1'b0;
      ALUSrc_B  = 1'b1;
      MemtoReg  = 3'b001;
      RegWrite  = 1'b1;
      ImmSel    = 3'b000;
      MemRW     = 1'b0; // read
      case (Fun3)
        3'b000: WHBU      = 4'b0010; // LB
        3'b001: WHBU      = 4'b0100; // LH
        3'b010: WHBU      = 4'b1000; // LW
        3'b100: WHBU      = 4'b0011; // LBU
        3'b101: WHBU      = 4'b0101; // LHU
        default: WHBU = 4'b0000;
      endcase
      Jump      = 2'b00;
      Branch    = 3'b000;
      ALU_Control = 4'b0000; // ADD for address calculation
    end
    5'b01000: begin // Store 
      is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
      illegal  = 1'b0;
      ALUSrc_B  = 1'b1;
      //MemtoReg  = 2'b01;
      MemtoReg = 3'bx;
      //RegWrite  = 1'b1;
      RegWrite  = 1'b0;
      ImmSel    = 3'b001;
      MemRW     = 1'b1; // write
      case (Fun3)
        3'b000: WHBU      = 4'b0010; // SB
        3'b001: WHBU      = 4'b0100; // SH
        3'b010: WHBU      = 4'b1000; // SW
        default: WHBU = 4'b0000;
      endcase

      Jump      = 2'b00;
      Branch    = 3'b000; 
      ALU_Control = 4'b0000; // ADD for address calculation
    end
    5'b11000: begin // Branch
      is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
      illegal  = 1'b0;
      WHBU      = 4'b0000;
      ALUSrc_B  = 1'b0;
      MemtoReg  = 3'bx; // new
      RegWrite  = 1'b0; //
      ImmSel    = 3'b010; 
      //MemRW     = 1'bx;

      //Branch    = 1'b1;
      Jump      = 2'b00;
      case (Fun3)
        3'b000: begin Branch = 3'b001;  ALU_Control = 4'd1; end // BEQ, do SUB in ALU 
        3'b001: begin Branch = 3'b010;  ALU_Control = 4'd1; end // BNE
        3'b100: begin Branch = 3'b011;  ALU_Control = 4'd3; end // BLT, do SLT in ALU
        3'b101: begin Branch = 3'b100;  ALU_Control = 4'd3; end // BGE
        3'b110: begin Branch = 3'b101;  ALU_Control = 4'd4; end // BLTU, do SLTU in ALU
        3'b111: begin Branch = 3'b110;  ALU_Control = 4'd4; end // BGEU 
        default: begin Branch = 3'b000; ALU_Control = 4'd0; end
      endcase
    end
    5'b11011: begin // JAL 
      is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
      illegal  = 1'b0;
      WHBU     = 4'b0000;
      //ALUSrc_B = 1'b1;
      ALUSrc_B = 1'bx;
      MemtoReg = 3'b010; // PC + 4
      RegWrite = 1'b1; //
      ImmSel   = 3'b011;
      // MemRW = 1'bx;

      Jump     = 2'b01;
      Branch   = 3'b000;
    end
    5'b00100: begin // I-type ALU
      illegal  = 1'b0;
      WHBU      = 4'b0000; 
      ALUSrc_B  = 1'b1;
      MemtoReg  = 3'b000;
      RegWrite  = 1'b1;
      ImmSel    = 3'b000;
      //MemRW     = 1'bx; 

      Jump      = 2'b00;
      Branch    = 3'b000;
      case (Fun3)
        3'b000: ALU_Control = 4'b0000; // ADDI
        3'b001: ALU_Control = 4'b0010; // SLLI
        3'b010: ALU_Control = 4'b0011; // SLTI
        3'b011: ALU_Control = 4'b0100; // SLTIU
        3'b100: ALU_Control = 4'b0101; // XORI
        3'b101: ALU_Control = (Fun7) ? 4'b0111 : 4'b0110; // SRAI / SRLI
        3'b110: ALU_Control = 4'b1000; // ORI
        3'b111: ALU_Control = 4'b1001; // ANDI
        default: ALU_Control = 4'b0000;
      endcase
    end
    5'b11001: begin // I-type JALR
      is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
      illegal  = 1'b0;
      WHBU      = 4'b0000;
      ALUSrc_B = 1'b1;
      MemtoReg = 2'b010; // PC + 4
      RegWrite = 1'b1; //
      ImmSel   = 3'b000;
      //MemRW    = 1'b0;

      MemRW    = 1'bx;
      Jump     = 2'b10;
      Branch   = 3'b000;
      ALU_Control = 4'd0; // ADD
    end
    5'b01101: begin // lui
      is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
      illegal  = 1'b0;
      WHBU      = 4'b0000;
      ALUSrc_B = 1'bx;
      MemtoReg = 3'b11; // lui_res = Imm
      RegWrite = 1'b1; //
      ImmSel   = 3'b100; // U-type
      MemRW    = 1'bx;
      Jump     = 2'b00;
      Branch   = 3'b000;
      ALU_Control = 4'dx; // ADD
    end
    5'b00101: begin // auipc
      is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
      illegal  = 1'b0;
      WHBU      = 4'b0000;
      ALUSrc_B = 1'bx;
      MemtoReg = 3'b100; // auipc_res = PC + Imm
      RegWrite = 1'b1; //
      ImmSel   = 3'b100;
      MemRW    = 1'bx;
      Jump     = 2'b00;
      Branch   = 3'b000;
      ALU_Control = 4'dx; // ADD
    end
    default: illegal = 1'b1; // illegal instruction
  endcase
end
endmodule

更改的部分为:

always @(*) begin
  case(OPcode):
    5'b11100: begin // I-type csr
      illegal  = 1'b0;
      WHBU     = 4'b0;
      ALUSrc_B = 1'b0;
      MemtoReg = 3'b101;
      RegWrite = 1'b1; // write into rd
      ImmSel   = 3'b101;
      Jump     = 2'b00;
      Branch   = 3'b000;
      csr_w    = 1'b1;
      case (Fun3)
        3'b000: begin
          if (High7 == 7'b0011000) begin // mret
              mret = 1'b1;
              ecall = 1'b0;
          end
          else begin // ecall
              mret = 1'b0;
              ecall = 1'b1;
          end
        end
        3'b001: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;end// csrrw
        3'b010: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b01;end// csrrs
        3'b011: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b10;end// csrrc
        3'b101: begin is_csri = 1'b1; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;end// csrrwi
        3'b110: begin is_csri = 1'b1; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b01;end// csrrsi
        3'b111: begin is_csri = 1'b1; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b10;end// csrrci
        default: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;end
      endcase
    end
  endcase
end
即产生对应的异常控制信号

4. DataPath 代码:

module DataPaths(
    input [2:0] ImmSel, input ALUSrc_B, input [2:0] MemtoReg, input [1:0] Jump, 
    input [2:0] Branch, input RegWrite, input [3:0] ALU_Control, input is_csri,
    input [1:0] csr_wsc_mode, 
    input csr_w,
    input illegal, input mret, input ecall, 
    input INT, // external interruption
    input [31:0] Data_in, input clk, input [31:0] inst_field, input rst, input [3:0] WHBU, 
    output [31:0] Reg00, output [31:0] Reg01, output [31:0] Reg02, output [31:0] Reg03,
    output [31:0] Reg04, output [31:0] Reg05, output [31:0] Reg06, output [31:0] Reg07,
    output [31:0] Reg08, output [31:0] Reg09, output [31:0] Reg10, output [31:0] Reg11,
    output [31:0] Reg12, output [31:0] Reg13, output [31:0] Reg14, output [31:0] Reg15,
    output [31:0] Reg16, output [31:0] Reg17, output [31:0] Reg18, output [31:0] Reg19,
    output [31:0] Reg20, output [31:0] Reg21, output [31:0] Reg22, output [31:0] Reg23,
    output [31:0] Reg24, output [31:0] Reg25, output [31:0] Reg26, output [31:0] Reg27,
    output [31:0] Reg28, output [31:0] Reg29, output [31:0] Reg30, output [31:0] Reg31,
    output [31:0] ALU_out, output reg [31:0] Data_out, output [31:0] PC_out,
    output wire [3:0] out_wea);

reg branch;
wire Branch_one;
wire [31:0] branch_out;
wire ALU_zero;
wire ALU_overflow;
wire [31:0] PCAddImm;
wire [31:0] ALU_A;
wire [31:0] ALU_B;
reg [31:0] reg_wt_data;
wire [31:0] PCAdd4;
reg [31:0] PC_in;
wire [31:0] ImmOut;
wire [1:0] LSwea;
wire expt_int;
reg [3:0] wea;
wire [31:0] rdata;
reg [31:0] LoadData;
wire [31:0] ReadData;

wire [31:0] mepc_bypass_in; 
wire [31:0] mcause_bypass_in;    
wire [31:0] mtval_bypass_in;  
wire [31:0] mstatus_bypass_in;      

assign LSwea = ImmOut[1:0];
assign PCAdd4 = PC_out + 32'd4;
assign Branch_one = (Branch == 3'b000) ? 1'b0 : 1'b1;
assign PCAddImm = ImmOut + PC_out; // ALU ? Adder!

always @ (*) begin
    case (MemtoReg)
        3'd0: reg_wt_data = ALU_out; // ALU
        3'd1: begin
            case (WHBU)
                4'b0010: reg_wt_data = {{24{LoadData[7]}}, LoadData[7:0]}; // LB
                4'b0100: reg_wt_data = {{16{LoadData[15]}}, LoadData[15:0]}; // LH
                4'b1000: reg_wt_data = LoadData; // LW
                4'b0011: reg_wt_data = {24'b0, LoadData[7:0]}; // LBU
                4'b0101: reg_wt_data = {16'b0, LoadData[15:0]}; // LHU
                default: reg_wt_data = 0;
            endcase 
        end
        3'd2: reg_wt_data = PCAdd4; // JAL
        3'd3: reg_wt_data = ImmOut; // lui
        3'd4: reg_wt_data = PCAddImm; // auipc
        3'd5: reg_wt_data = rdata; // csr
        default: reg_wt_data = 32'b0;
    endcase      
end

always @ (*) begin  
    case (Branch)
        3'b001: branch = Branch_one & (ALU_zero); // BEQ
        3'b010: branch = Branch_one & (~ALU_zero); // BNE
        3'b011: branch = Branch_one & (ALU_out[0]); // BLT
        3'b100: branch = Branch_one & (~ALU_out[0]); // BGE
        3'b101: branch = Branch_one & (ALU_out[0]); // BLTU
        3'b110: branch = Branch_one & (~ALU_out[0]); // BGEU 
        default: branch = 1'b0;
    endcase
    case (Jump)
        2'b00: PC_in = branch_out;
        2'b01: PC_in = PCAddImm;
        2'b10: PC_in = ALU_out;
        default: PC_in = 32'b0;
        //2'b11: PC_in = ALU_out;
    endcase
end
always @ (*) begin
    case (LSwea)
        2'b00: begin
            LoadData = Data_in;
            Data_out = ReadData;
            case (WHBU)
                4'b0010: wea = 4'b0001;
                4'b0100: wea = 4'b0011;
                4'b1000: wea = 4'b1111;
                default: wea = 0;
            endcase
        end
        2'b01: begin
            LoadData = {8'b0, Data_in[31:8]};
            Data_out = {ReadData[23:0], 8'b0};
            case (WHBU)
                4'b0010: wea = 4'b0010;
                4'b0100: wea = 4'b0110;
                //4'b1000: wea = 4'b1111;
                default: wea = 0;
            endcase
        end
        2'b10: begin
            LoadData = {16'b0, Data_in[31:16]};
            Data_out = {ReadData[15:0], 16'b0};
            case (WHBU)
                4'b0010: wea = 4'b0100;
                4'b0100: wea = 4'b1100;
                //4'b1000: wea = 4'b1111;
                default: wea = 0;
            endcase
        end
        2'b11: begin
            LoadData = {24'b0, Data_in[31:24]};
            Data_out = {ReadData[23:0], 24'b0};
            case (WHBU)
                4'b0010: wea = 4'b1000;
                //4'b0100: wea = 4'b1100;
                //4'b1000: wea = 4'b1111;
                default: wea = 0;
            endcase
        end
    endcase

end
wire [31:0] wdata;
assign ALU_B = ALUSrc_B ? ImmOut : ReadData;
assign branch_out = branch ? PCAddImm : PCAdd4;
assign wdata = is_csri ? ImmOut : ALU_A; // uimm or rs1
wire pc_change;
wire en;
ALU alu(.A(ALU_A), .B(ALU_B), .ALU_operation(ALU_Control), .res(ALU_out), 
    .zero(ALU_zero), .overflow(ALU_overflow));
wire [4:0] RS1, RS2, WT;
assign RS1 = inst_field[19:15];
assign RS2 = inst_field[24:20];
assign WT = inst_field[11:7];
Regs regs(.clk(clk), .rst(rst), .Rs1_addr(RS1), .Rs2_addr(RS2), 
    .Wt_addr(WT), .Wt_data(reg_wt_data), .RegWrite(RegWrite & en), 
    .Rs1_data(ALU_A), .Rs2_data(ReadData),
    .Reg00(Reg00),.Reg01(Reg01),.Reg02(Reg02),.Reg03(Reg03),
    .Reg04(Reg04),.Reg05(Reg05),.Reg06(Reg06),.Reg07(Reg07),
    .Reg08(Reg08),.Reg09(Reg09),.Reg10(Reg10),.Reg11(Reg11),
    .Reg12(Reg12),.Reg13(Reg13),.Reg14(Reg14),.Reg15(Reg15),
    .Reg16(Reg16),.Reg17(Reg17),.Reg18(Reg18),.Reg19(Reg19),
    .Reg20(Reg20),.Reg21(Reg21),.Reg22(Reg22),.Reg23(Reg23),
    .Reg24(Reg24),.Reg25(Reg25),.Reg26(Reg26),.Reg27(Reg27),
    .Reg28(Reg28),.Reg29(Reg29),.Reg30(Reg30),.Reg31(Reg31));



wire [11:0] addr;
wire [31:0] mstatus;
wire [31:0] mtvec;
wire [31:0] mepc;
assign addr = inst_field[31:20];
CSRRegs csrregs(.clk(clk), .rst(rst), .raddr(addr), 
    .waddr(addr), .wdata(wdata), .rdata(rdata), .csr_w(csr_w),
    .csr_wsc_mode(csr_wsc_mode), .expt_int(expt_int), .mepc_bypass_in(mepc_bypass_in),
    .mcause_bypass_in(mcause_bypass_in), .mtval_bypass_in(mtval_bypass_in), .mstatus_bypass_in(mstatus_bypass_in),
    .mstatus(mstatus), .mtvec(mtvec), .mepc(mepc)
    );


wire [3:0] EN;
assign EN = {en, en, en, en};
assign out_wea = wea & EN;

wire [31:0] INTPC;
RV_INT rv_int(.clk(clk), .rst(rst), .INT(INT), .expt_int(expt_int), .ecall(ecall), .mret(mret), 
    .illegal_inst(illegal), .pc_current(PC_out), .en(en), .pc(INTPC), .pc_change(pc_change), 
    .mepc_bypass_in(mepc_bypass_in), .mcause_bypass_in(mcause_bypass_in), 
    .mtval_bypass_in(mtval_bypass_in), .mstatus_bypass_in(mstatus_bypass_in),
    .mstatus(mstatus), .mtvec(mtvec), .mepc(mepc), .inst(inst_field));

ImmGen immgen(.ImmSel(ImmSel), .inst_field(inst_field), .Imm_out(ImmOut));


wire [31:0] PC_IN;
assign PC_IN = pc_change ? INTPC : PC_in;

Reg PC(.clk(clk), .rst(rst), .CE(1'b1), .D(PC_IN), .Q(PC_out));

endmodule
更改的部分为:

wire [31:0] wdata;
assign wdata = is_csri ? ImmOut : ALU_A; // uimm or rs1
wire pc_change;
wire [11:0] addr;
wire [31:0] mstatus;
wire [31:0] mtvec;
wire [31:0] mepc;
assign addr = inst_field[31:20];
CSRRegs csrregs(.clk(clk), .rst(rst), .raddr(addr), 
    .waddr(addr), .wdata(wdata), .rdata(rdata), .csr_w(csr_w),
    .csr_wsc_mode(csr_wsc_mode), .expt_int(expt_int), .mepc_bypass_in(mepc_bypass_in),
    .mcause_bypass_in(mcause_bypass_in), .mtval_bypass_in(mtval_bypass_in), .mstatus_bypass_in(mstatus_bypass_in),
    .mstatus(mstatus), .mtvec(mtvec), .mepc(mepc)
    );

wire [31:0] INTPC;
RV_INT rv_int(.clk(clk), .rst(rst), .INT(INT), .expt_int(expt_int), .ecall(ecall), .mret(mret), 
    .illegal_inst(illegal), .pc_current(PC_out), .en(en), .pc(INTPC), .pc_change(pc_change), 
    .mepc_bypass_in(mepc_bypass_in), .mcause_bypass_in(mcause_bypass_in), 
    .mtval_bypass_in(mtval_bypass_in), .mstatus_bypass_in(mstatus_bypass_in),
    .mstatus(mstatus), .mtvec(mtvec), .mepc(mepc), .inst(inst_field));

wire [31:0] PC_IN;
assign PC_IN = pc_change ? INTPC : PC_in;

Reg PC(.clk(clk), .rst(rst), .CE(1'b1), .D(PC_IN), .Q(PC_out));
即实例化了 CSRRegs 和 RV_INT 模块,完成接线,并将 PC 值的输入进行选择

仿真关键步骤说明

仿真代码同 Lab4-2

2. 仿真测试代码:

    j    start            # 00
dummy:
    nop                   # 04
    nop                   # 08
    nop                   # 0C
    nop                   # 10
    nop                   # 14
    nop                   # 18
    nop                   # 1C
    j    dummy

start:
    addi x1, x0, 1
    add x2, x1, x0
    add x3, x2, x1
    add x4, x3, x2
    add x5, x4, x3
    add x6, x5, x4
    addi x7, x0, 0x7c
    csrrs x0, 0x305, x7
    csrrc x0, 0x341, x3
    csrrci x0, 0x342, x4
    csrrsi x0, 0x343, x5
    csrrwi x21, 0x300, x8
    # Here will be an illegal instruction
    add x7, x6, x5
    add x8, x7, x6
    add x9, x8, x7
    ecall
    add x10, x9, x8
    add x11, x10, x9
    add x12, x11, x10
pass_1:
    li   x31, 0x666
    j    dummy
trap:
    csrrs x21, 0x300, x0 # mstatus
    csrrs x22, 0x305, x0 # mtvec
    csrrs x23, 0x341, x0 # mepc
    csrrs x24, 0x342, x0 # mcause
    csrrs x25, 0x343, x0 # mtval
    lui x26, 0x80000
    addi x26, x26, 0x00B
    beq x24, x26, return
illegal_ecall:
    addi x23, x23, 4
    csrrw x0, 0x341, x23
    beq x0, x0, return
return:
    add x0, x0, x0

这是基本的测试 CSR 指令的代码,并且包含了 trap 代码

在仿真波形测试中,会加入硬件中断, 非法指令和软件中断

实验结果与分析

仿真结果

alt text

alt text

alt text

alt text

可以看到,再出现第一个非法指令 ffffffff 后,程序保存了 mepc=54 并跳转到了 trap 程序

在 trap 里面用寄存器保存了五个异常寄存器的值,并更改 mepc=58,因为不是硬件中断,所以要将 mepc = mepc + 4

之后程序返回到了 pc=58 并继续运行

之后,程序遇到了 ecall 指令,同样保存了 mepc=64 并跳转到了 trap 程序

在 trap 里面用寄存器保存了五个异常寄存器的值,并更改 mepc=68,因为不是硬件中断,所以要将 mepc = mepc + 4

在这期间,INT 被置为 1, 表示外部中断被打开,但是程序没有触发新的中断,因为当前已经在中断中了,这一表现符合不触发新中断的要求

之后程序返回到了 pc=68 并继续运行

之后程序遇到了另一个外部中断,同样保存了 mepc=6c 并跳转到了 trap 程序

在 trap 里面用寄存器保存了五个异常寄存器的值,并更改 mepc=6c,因为是硬件中断,所以要将 mepc = mepc

之后程序返回到了 pc=6c 并继续运行

之后运行到 pc=78, 主程序结束,Reg31=666 并且进入 dummy 循环,说明测试通过,结果符合预期

思考题

在涉及到一个大立即数的读入时,我们经常能想到使用 lui & addi 来实现,比如下面这段代码就将 0x22223333 赋给了 t0:

lui t0, 0x22223
addi t0, t0, 0x333
你是否能通过以下代码得到 0xDEADBEEF?如果你觉得不能的话,先解释为什么不能,再修改代码中的一个字符,使得以下代码有效地得到 0xDEADBEEF
lui t1, 0xDEADB 
addi t1, t1, -273 // 0xEEF

回答:

上面的代码会得到 DEADAEEF, 主要原因是 addi 指令会将指令中的 12 位数据进行符号拓展后产生 32 位的立即数,所以由 -273 产生的立即数为 FFFFFEEF, 所以与 lui 产生的 DEADB000 相加,结果为 DEADAEEF

只改变一个字符解决方法,可以改成:

lui t1, 0xDEADC
addi t1, t1, -273 // 0xEEF

这样,高 20 位的 DEADC 就会与 FFFFF 相加得到 DEADB, 最后就能得到 DEADBEEF

PCPU

Lab 5-1

操作方法与实验步骤

代码设计层次结构图及说明

pi

在流水线 CPU 中, 需要将单周期 CPU 拆成 \(5\) 个阶段, 分别是 IF, ID, EX, MEM, WB 阶段

同时需要将每个阶段中的重要信号通过阶段寄存器传到下一个阶段

  • IF 阶段主要实现通过 PCSrc 取址, 从 ROM 取出对应的指令

  • ID 阶段主要实现 SCPU_Ctrl 用于产生控制信号, RegFile 寄存器堆用于读取, 写入寄存器值, 还有 ImmGen 用于产生指令对应的立即数

同时产生 StoreData 用于 store 指令存储值

  • EX 阶段主要实现 ALU 用于计算

  • MEM 阶段主要实现输出存入 RAM 中的值, 以及从 RAM 中读入的值传入下一阶段

以及产生 PCSrc 用于分支跳转 * WB 阶段主要实现产生寄存器需要写入的值, 并送回 ID 阶段

源代码

  1. 整体的流水线 CPU 代码:
module Pipeline_CPU(
    input clk,
    input rst,
    input [31:0] Data_in,  
    input [31:0] inst_IF,  
    output [31:0] Addr_out, 
    output [31:0] Data_out,  
    output [31:0] Data_out_WB,
    output [31:0] PC_out_IF,  
    output [31:0] inst_ID,  
    output [31:0] PC_out_ID,
    output [31:0] PC_out_EX,
    output MemRW_EX,  
    output MemRW_Mem, 
    output [3:0] wea,
    output [31:0] Reg00, output [31:0] Reg01,output [31:0] Reg02,output [31:0] Reg03,
    output [31:0] Reg04, output [31:0] Reg05,output [31:0] Reg06,output [31:0] Reg07,
    output [31:0] Reg08, output [31:0] Reg09,output [31:0] Reg10,output [31:0] Reg11,
    output [31:0] Reg12, output [31:0] Reg13,output [31:0] Reg14,output [31:0] Reg15,
    output [31:0] Reg16, output [31:0] Reg17,output [31:0] Reg18,output [31:0] Reg19,
    output [31:0] Reg20, output [31:0] Reg21,output [31:0] Reg22,output [31:0] Reg23,
    output [31:0] Reg24, output [31:0] Reg25,output [31:0] Reg26,output [31:0] Reg27,
    output [31:0] Reg28, output [31:0] Reg29,output [31:0] Reg30,output [31:0] Reg31

);
    wire [31:0] PC_out_EXMem, ALU_out_EXMem;
    wire [4:0] Rd_addr_out_MemWB;
    wire RegWrite_out_MemWB;
    wire [2:0] PCSrc;

    Pipeline_IF PPLIF (
        .clk_IF(clk),
        .rst_IF(rst),
        .en_IF(1'b1),
        .PC_in_IF(PC_out_EXMem),
        .PCSrc(PCSrc),
        .ALU_in_IF(ALU_out_EXMem),
        .PC_out_IF(PC_out_IF)

    );
    wire [31:0] PC_out_IFID, inst_out_IFID;


    IF_reg_ID PPLIFID (
        .clk_IFID(clk),
        .rst_IFID(rst),
        .en_IFID(1'b1), 
        .PC_in_IFID(PC_out_IF),
        .inst_in_IFID(inst_IF),
        .PC_out_IFID(PC_out_IFID),
        .inst_out_IFID(inst_out_IFID)
    );


    wire [31:0] Rs1_out_ID, Rs2_out_ID, 
    Imm_out_ID;
    wire ALUSrc_B_ID, MemRW_ID, RegWrite_out_ID;
    wire [4:0] Rd_addr_out_ID;
    wire [3:0] ALU_control_ID;
    wire [2:0] Branch_ID;
    wire [1:0] Jump_ID;
    wire [2:0] MemtoReg_ID;
    wire [3:0] WHBU_ID;
    wire [1:0] LSwea_ID;
    wire [31:0] StoreData_ID;
    wire [3:0] wea_ID;



    Pipeline_ID PPLID (
        .clk_ID(clk),
        .rst_ID(rst),
        .RegWrite_in_ID(RegWrite_out_MemWB), 
        .Rd_addr_ID(Rd_addr_out_MemWB), 
        .Wt_data_ID(Data_out_WB), 
        .inst_in_ID(inst_out_IFID), 
        .Rd_addr_out_ID(Rd_addr_out_ID),
        .Rs1_out_ID(Rs1_out_ID), 
        .Rs2_out_ID(Rs2_out_ID), 
        .Imm_out_ID(Imm_out_ID), 
        .ALUSrc_B_ID(ALUSrc_B_ID), 
        .ALU_control_ID(ALU_control_ID),
        .Branch_ID(Branch_ID), 
        .MemRW_ID(MemRW_ID),
        .Jump_ID(Jump_ID), 
        .MemtoReg_ID(MemtoReg_ID), 
        .RegWrite_out_ID(RegWrite_out_ID),
        .WHBU_ID(WHBU_ID),
        .LSwea_ID(LSwea_ID),
        .StoreData_ID(StoreData_ID),
        .wea_ID(wea_ID),
        .Reg00(Reg00),.Reg01(Reg01),.Reg02(Reg02),.Reg03(Reg03),
        .Reg04(Reg04),.Reg05(Reg05),.Reg06(Reg06),.Reg07(Reg07),
        .Reg08(Reg08),.Reg09(Reg09),.Reg10(Reg10),.Reg11(Reg11),
        .Reg12(Reg12),.Reg13(Reg13),.Reg14(Reg14),.Reg15(Reg15),
        .Reg16(Reg16),.Reg17(Reg17),.Reg18(Reg18),.Reg19(Reg19),
        .Reg20(Reg20),.Reg21(Reg21),.Reg22(Reg22),.Reg23(Reg23),
        .Reg24(Reg24),.Reg25(Reg25),.Reg26(Reg26),.Reg27(Reg27),
        .Reg28(Reg28),.Reg29(Reg29),.Reg30(Reg30),.Reg31(Reg31)
    );
    wire [31:0] PC_out_IDEX, Rs1_out_IDEX, Rs2_out_IDEX, Imm_out_IDEX; 
    wire [4:0] Rd_addr_out_IDEX;
    wire ALUSrc_B_out_IDEX, MemRW_out_IDEX, RegWrite_out_IDEX;

    wire [3:0] ALU_control_out_IDEX;
    wire [2:0] Branch_out_IDEX;
    wire [1:0] Jump_out_IDEX;
    wire [2:0] MemtoReg_out_IDEX;
    wire [3:0] WHBU_out_IDEX;
    wire [1:0] LSwea_out_IDEX;
    wire [31:0] StoreData_out_IDEX;
    wire [3:0] wea_out_IDEX;


    ID_reg_Ex PPLIDEx (
        .clk_IDEX(clk),
        .rst_IDEX(rst),
        .en_IDEX(1'b1),
        .PC_in_IDEX(PC_out_IFID),
        .Rd_addr_IDEX(Rd_addr_out_ID),
        .Rs1_in_IDEX(Rs1_out_ID),
        .Rs2_in_IDEX(Rs2_out_ID),
        .Imm_in_IDEX(Imm_out_ID),
        .ALUSrc_B_in_IDEX(ALUSrc_B_ID),
        .ALU_control_in_IDEX(ALU_control_ID),
        .Branch_in_IDEX(Branch_ID),
        .Jump_in_IDEX(Jump_ID),
        .MemRW_in_IDEX(MemRW_ID),
        .MemtoReg_in_IDEX(MemtoReg_ID),
        .RegWrite_in_IDEX(RegWrite_out_ID),
        .WHBU_in_IDEX(WHBU_ID),
        .LSwea_in_IDEX(LSwea_ID),
        .StoreData_in_IDEX(StoreData_ID),
        .wea_in_IDEX(wea_ID),
        .PC_out_IDEX(PC_out_IDEX),
        .Rd_addr_out_IDEX(Rd_addr_out_IDEX),
        .Rs1_out_IDEX(Rs1_out_IDEX),
        .Rs2_out_IDEX(Rs2_out_IDEX),
        .Imm_out_IDEX(Imm_out_IDEX),
        .ALUSrc_B_out_IDEX(ALUSrc_B_out_IDEX),
        .ALU_control_out_IDEX(ALU_control_out_IDEX),
        .Branch_out_IDEX(Branch_out_IDEX),
        .Jump_out_IDEX(Jump_out_IDEX),
        .MemRW_out_IDEX(MemRW_out_IDEX),
        .MemtoReg_out_IDEX(MemtoReg_out_IDEX),
        .RegWrite_out_IDEX(RegWrite_out_IDEX),
        .WHBU_out_IDEX(WHBU_out_IDEX),
        .LSwea_out_IDEX(LSwea_out_IDEX),
        .StoreData_out_IDEX(StoreData_out_IDEX),
        .wea_out_IDEX(wea_out_IDEX)

    );
    wire [31:0] PC4_out_EX, zero_out_EX, ALU_out_EX, Rs2_out_EX;

    Pipeline_Ex PPLEx (
        .PC_in_EX(PC_out_IDEX),
        .Rs1_in_EX(Rs1_out_IDEX),
        .Rs2_in_EX(Rs2_out_IDEX),
        .Imm_in_EX(Imm_out_IDEX),
        .ALUSrc_B_in_EX(ALUSrc_B_out_IDEX),
        .ALU_control_in_EX(ALU_control_out_IDEX),
        .PC_out_EX(PC_out_EX),
        .PC4_out_EX(PC4_out_EX),
        .zero_out_EX(zero_out_EX),
        .ALU_out_EX(ALU_out_EX),
        .Rs2_out_EX(Rs2_out_EX)
    );
    wire [31:0] PC4_out_EXMem, Rs2_out_EXMem, Imm_out_EXMem;
    wire [4:0] Rd_addr_out_EXMem;
    wire zero_out_EXMem, MemRW_out_EXMem, RegWrite_out_EXMem;
    wire [2:0] Branch_out_EXMem;
    wire [1:0] Jump_out_EXMem;
    wire [2:0] MemtoReg_out_EXMem;
    wire [3:0] WHBU_out_EXMem;
    wire [1:0] LSwea_out_EXMem;
    wire [31:0] StoreData_out_EXMem;
    wire [3:0] wea_out_EXMem;


    Ex_reg_Mem PPLExMem (
        .clk_EXMem(clk),
        .rst_EXMem(rst),
        .en_EXMem(1'b1),
        .PC_in_EXMem(PC_out_EX),
        .Imm_in_EXMem(Imm_out_IDEX),
        .PC4_in_EXMem(PC4_out_EX),
        .Rd_addr_EXMem(Rd_addr_out_IDEX),
        .zero_in_EXMem(zero_out_EX),
        .ALU_in_EXMem(ALU_out_EX),
        .Rs2_in_EXMem(Rs2_out_EX),
        .Branch_in_EXMem(Branch_out_IDEX),
        .Jump_in_EXMem(Jump_out_IDEX),
        .MemRW_in_EXMem(MemRW_out_IDEX),
        .MemtoReg_in_EXMem(MemtoReg_out_IDEX),
        .RegWrite_in_EXMem(RegWrite_out_IDEX),
        .WHBU_in_EXMem(WHBU_out_IDEX),
        .LSwea_in_EXMem(LSwea_out_IDEX),
        .StoreData_in_EXMem(StoreData_out_IDEX),
        .wea_in_EXMem(wea_out_IDEX),
        .PC_out_EXMem(PC_out_EXMem),
        .Imm_out_EXMem(Imm_out_EXMem),
        .PC4_out_EXMem(PC4_out_EXMem),
        .Rd_addr_out_EXMem(Rd_addr_out_EXMem),
        .zero_out_EXMem(zero_out_EXMem),
        .ALU_out_EXMem(ALU_out_EXMem),

        .Rs2_out_EXMem(Rs2_out_EXMem),
        .Branch_out_EXMem(Branch_out_EXMem),
        .Jump_out_EXMem(Jump_out_EXMem),
        .MemRW_out_EXMem(MemRW_out_EXMem),
        .MemtoReg_out_EXMem(MemtoReg_out_EXMem),
        .RegWrite_out_EXMem(RegWrite_out_EXMem),
        .WHBU_out_EXMem(WHBU_out_EXMem),
        .LSwea_out_EXMem(LSwea_out_EXMem),
        .StoreData_out_EXMem(StoreData_out_EXMem),
        .wea_out_EXMem(wea_out_EXMem)
    );

    Pipeline_Mem PPLMem (
        .zero_in_Mem(zero_out_EXMem),
        .res_in_Mem(ALU_out_EXMem),
        .Branch_in_Mem(Branch_out_EXMem),
        .Jump_in_Mem(Jump_out_EXMem),
        .PCSrc(PCSrc)
    );
    wire [31:0] PC_out_MemWB, Imm_out_MemWB, PC4_out_MemWB, 
        ALU_out_MemWB, Dmem_data_out_MemWB;

    wire [2:0] MemtoReg_out_MemWB;
    //wire RegWrite_out_MemWB;
    wire [3:0] WHBU_out_MemWB;
    wire [1:0] LSwea_out_MemWB;


    Mem_reg_WB PPLMemWB (
        .clk_MemWB(clk),
        .rst_MemWB(rst),
        .en_MemWB(1'b1),
        .PC_in_MemWB(PC_out_EXMem),
        .Imm_in_MemWB(Imm_out_EXMem),
        .PC4_in_MemWB(PC4_out_EXMem),
        .Rd_addr_MemWB(Rd_addr_out_EXMem),
        .ALU_in_MemWB(ALU_out_EXMem),
        .Dmem_data_MemWB(Data_in),
        .MemtoReg_in_MemWB(MemtoReg_out_EXMem),
        .RegWrite_in_MemWB(RegWrite_out_EXMem),
        .WHBU_in_MemWB(WHBU_out_EXMem),
        .LSwea_in_MemWB(LSwea_out_EXMem),
        .PC_out_MemWB(PC_out_MemWB),
        .Imm_out_MemWB(Imm_out_MemWB),
        .PC4_out_MemWB(PC4_out_MemWB),
        .Rd_addr_out_MemWB(Rd_addr_out_MemWB),
        .ALU_out_MemWB(ALU_out_MemWB),
        .Dmem_data_out_MemWB(Dmem_data_out_MemWB),
        .MemtoReg_out_MemWB(MemtoReg_out_MemWB),
        .RegWrite_out_MemWB(RegWrite_out_MemWB),
        .WHBU_out_MemWB(WHBU_out_MemWB),
        .LSwea_out_MemWB(LSwea_out_MemWB)
    );

    Pipeline_WB PPLWB (
        .PC4_in_WB(PC4_out_MemWB),
        .ALU_in_WB(ALU_out_MemWB),
        .Dmem_data_WB(Dmem_data_out_MemWB),
        .Imm_in_WB(Imm_out_MemWB),
        .PC_in_WB(PC_out_MemWB),
        .MemtoReg_in_WB(MemtoReg_out_MemWB),
        .WHBU_in_WB(WHBU_out_MemWB),
        .LSwea_in_WB(LSwea_out_MemWB),
        .Data_out_WB(Data_out_WB)
    );
    assign MemRW_EX = MemRW_out_IDEX;
    assign MemRW_Mem = MemRW_out_EXMem;
    assign Addr_out = ALU_out_EXMem;
    assign Data_out = StoreData_out_EXMem;
    assign wea = wea_out_EXMem;
    assign PC_out_ID = PC_out_IFID;
    assign inst_ID = inst_out_IFID;

endmodule

实例化了 \(9\) 个模块, \(5\) 个阶段 + \(4\) 个寄存器

  1. IF 阶段代码:
module Pipeline_IF( 
    input clk_IF, //时钟
    input rst_IF, //复位
    input en_IF, //使能
    input [31:0] PC_in_IF, //取指令PC输入, = PCAddImm
    input [2:0] PCSrc, //PC输入选择
    input [31:0] ALU_in_IF, //ALU输出
    output wire [31:0] PC_out_IF //PC输出
); 

reg [31:0] PC_in;
wire [31:0] PCAdd4;
assign PCAdd4 = PC_out_IF + 32'd4;
always @ (*) begin
    case (PCSrc)
        3'b000: PC_in = PCAdd4;
        3'b100: PC_in = PC_in_IF; // branch
        3'b001: PC_in = PC_in_IF; // jal
        3'b010: PC_in = ALU_in_IF; // jalr
        default: PC_in = 32'b0;
    endcase
end
Reg PC(.clk(clk_IF), .rst(rst_IF), .CE(en_IF), .D(PC_in), .Q(PC_out_IF));
endmodule

输出写回数据

  1. ID 阶段代码:
module Pipeline_ID( 
    input clk_ID, //时钟
    input rst_ID, //复位
    input RegWrite_in_ID, //寄存器堆使能
    input [4:0] Rd_addr_ID, //写目的地址输入
    input [31:0] Wt_data_ID, //写数据输出
    input [31:0] inst_in_ID, //指令输入
    output [4:0] Rd_addr_out_ID, //写目的地址输出
    output [31:0] Rs1_out_ID , //操作数1输出
    output [31:0] Rs2_out_ID , //操作数2输出
    output [31:0] Imm_out_ID , //立即数输出
    output ALUSrc_B_ID , //ALU B端输入选择
    output [3:0] ALU_control_ID, //ALU控制
    output [2:0] Branch_ID, //Beq控制
    output MemRW_ID, //存储器读
    output [1:0] Jump_ID, //Jal控制
    output [2:0] MemtoReg_ID, //寄存器写回
    output RegWrite_out_ID, //寄存器堆读写
    output [3:0] WHBU_ID,
    output [1:0] LSwea_ID,
    output reg [31:0] StoreData_ID,
    output reg [3:0] wea_ID,
    output [31:0] Reg00, output [31:0] Reg01,output [31:0] Reg02,output [31:0] Reg03,
    output [31:0] Reg04, output [31:0] Reg05,output [31:0] Reg06,output [31:0] Reg07,
    output [31:0] Reg08, output [31:0] Reg09,output [31:0] Reg10,output [31:0] Reg11,
    output [31:0] Reg12, output [31:0] Reg13,output [31:0] Reg14,output [31:0] Reg15,
    output [31:0] Reg16, output [31:0] Reg17,output [31:0] Reg18,output [31:0] Reg19,
    output [31:0] Reg20, output [31:0] Reg21,output [31:0] Reg22,output [31:0] Reg23,
    output [31:0] Reg24, output [31:0] Reg25,output [31:0] Reg26,output [31:0] Reg27,
    output [31:0] Reg28, output [31:0] Reg29,output [31:0] Reg30,output [31:0] Reg31
); 
wire [4:0] RS1, RS2, WT;
wire [4:0] OP;
wire [2:0] FUN3;
wire FUN7;
assign RS1 = inst_in_ID[19:15];
assign RS2 = inst_in_ID[24:20];
assign WT = inst_in_ID[11:7];
assign OP = inst_in_ID[6:2];
assign FUN3 = inst_in_ID[14:12];
assign FUN7 = inst_in_ID[30];
assign LSwea_ID = Imm_out_ID[1:0];
wire [2:0] ImmSel;
ImmGen immgen(.ImmSel(ImmSel), .inst_field(inst_in_ID), .Imm_out(Imm_out_ID));
SCPU_ctrls Ctrl(.OPcode(OP), .Fun3(FUN3), .Fun7(FUN7), 
    .ImmSel(ImmSel), .ALUSrc_B(ALUSrc_B_ID), .MemtoReg(MemtoReg_ID), 
    .Jump(Jump_ID), .Branch(Branch_ID), .RegWrite(RegWrite_out_ID), .MemRW(MemRW_ID), 
    .ALU_Control(ALU_control_ID), .WHBU(WHBU_ID)
    );
Regs regs(.clk(clk_ID), .rst(rst_ID), .Rs1_addr(RS1), .Rs2_addr(RS2), 
    .Wt_addr(Rd_addr_ID), .Wt_data(Wt_data_ID), .RegWrite(RegWrite_in_ID), 
    .Rs1_data(Rs1_out_ID), .Rs2_data(Rs2_out_ID),
    .Reg00(Reg00),.Reg01(Reg01),.Reg02(Reg02),.Reg03(Reg03),
    .Reg04(Reg04),.Reg05(Reg05),.Reg06(Reg06),.Reg07(Reg07),
    .Reg08(Reg08),.Reg09(Reg09),.Reg10(Reg10),.Reg11(Reg11),
    .Reg12(Reg12),.Reg13(Reg13),.Reg14(Reg14),.Reg15(Reg15),
    .Reg16(Reg16),.Reg17(Reg17),.Reg18(Reg18),.Reg19(Reg19),
    .Reg20(Reg20),.Reg21(Reg21),.Reg22(Reg22),.Reg23(Reg23),
    .Reg24(Reg24),.Reg25(Reg25),.Reg26(Reg26),.Reg27(Reg27),
    .Reg28(Reg28),.Reg29(Reg29),.Reg30(Reg30),.Reg31(Reg31)
    );
assign Rd_addr_out_ID = inst_in_ID[11:7];

always @ (*) begin
    case (LSwea_ID)
        2'b00: begin
            StoreData_ID = Rs2_out_ID;
            case (WHBU_ID)
                4'b0010: wea_ID = 4'b0001;
                4'b0100: wea_ID = 4'b0011;
                4'b1000: wea_ID = 4'b1111;
                default: wea_ID = 0;
            endcase
        end
        2'b01: begin
            StoreData_ID = {Rs2_out_ID[23:0], 8'b0};
            case (WHBU_ID)
                4'b0010: wea_ID = 4'b0010;
                4'b0100: wea_ID = 4'b0110;
                default: wea_ID = 0;
            endcase
        end
        2'b10: begin
            StoreData_ID = {Rs2_out_ID[15:0], 16'b0};
            case (WHBU_ID)
                4'b0010: wea_ID = 4'b0100;
                4'b0100: wea_ID = 4'b1100;
                default: wea_ID = 0;
            endcase
        end
        2'b11: begin
            StoreData_ID = {Rs2_out_ID[23:0], 24'b0};
            case (WHBU_ID)
                4'b0010: wea_ID = 4'b1000;
                default: wea_ID = 0;
            endcase
        end

    endcase
end    
endmodule

实例化了寄存器, 立即数生成器和 Ctrl Unit

  1. EX 阶段代码:
module Pipeline_Ex( 
    input[31:0] PC_in_EX, //PC输入
    input[31:0] Rs1_in_EX, //操作1输入
    input[31:0] Rs2_in_EX, //操作2输入
    input[31:0] Imm_in_EX, //立即数
    input ALUSrc_B_in_EX, //ALU B选择
    input [3:0] ALU_control_in_EX, //ALU选择控制
    output [31:0] PC_out_EX, //PC输出, PCAddImm
    output [31:0] PC4_out_EX, //PC+4输出
    output zero_out_EX, //ALU0输出
    output [31:0] ALU_out_EX, //ALU计算输出
    output [31:0] Rs2_out_EX //操作2输出
);
wire [31:0] ALUB;
assign PC_out_EX = PC_in_EX + Imm_in_EX; // PC_out = PC + Imm
assign PC4_out_EX = PC_in_EX + 32'd4;
assign Rs2_out_EX = Rs2_in_EX;
assign ALUB = (ALUSrc_B_in_EX) ? Imm_in_EX : Rs2_in_EX;
ALU alu(.A(Rs1_in_EX), .B(ALUB), .ALU_operation(ALU_control_in_EX), .res(ALU_out_EX), 
    .zero(zero_out_EX));

endmodule

实例化了 ALU

  1. MEM 阶段代码:
module Pipeline_Mem( 
    input zero_in_Mem, //zero
    input [31:0] res_in_Mem, // ALU res
    input [2:0] Branch_in_Mem, //beq
    input [1:0] Jump_in_Mem, //jal
    output [2:0] PCSrc //PC选择控制输出
);
wire Branch_one;
assign Branch_one = (Branch_in_Mem == 3'b000) ? 1'b0 : 1'b1;
reg branch;
always @ (*) begin  
    case (Branch_in_Mem)
        3'b001: branch = Branch_one & (zero_in_Mem); // BEQ
        3'b010: branch = Branch_one & (~zero_in_Mem); // BNE
        3'b011: branch = Branch_one & (res_in_Mem[0]); // BLT
        3'b100: branch = Branch_one & (~res_in_Mem[0]); // BGE
        3'b101: branch = Branch_one & (res_in_Mem[0]); // BLTU
        3'b110: branch = Branch_one & (~res_in_Mem[0]); // BGEU 
        default: branch = 1'b0;
    endcase
end

assign PCSrc = {branch, Jump_in_Mem}; 
endmodule

产生 PCSrc 并与 RAM 交互

  1. WB 阶段代码:
module Pipeline_WB( 
    input [31:0] PC4_in_WB, //PC+4输入
    input [31:0] ALU_in_WB, //ALU结果输出
    input [31:0] Dmem_data_WB, //存储器数据输出
    input [31:0] Imm_in_WB, //立即数输出
    input [31:0] PC_in_WB, //PC+立即数输出
    input [3:0] WHBU_in_WB,
    input [1:0] LSwea_in_WB,
    input [2:0] MemtoReg_in_WB, //写回选择控制
    output [31:0] Data_out_WB //写回数据输出
);
reg [31:0] reg_wt_data;
reg [31:0] LoadData;
wire [31:0]  Data_in;
assign Data_in = Dmem_data_WB;
assign Data_out_WB = reg_wt_data;
always @ (*) begin
    case (MemtoReg_in_WB)
        3'd0: reg_wt_data = ALU_in_WB; // ALU
        3'd1: begin
            case (WHBU_in_WB)
                4'b0010: reg_wt_data = {{24{LoadData[7]}}, LoadData[7:0]}; // LB
                4'b0100: reg_wt_data = {{16{LoadData[15]}}, LoadData[15:0]}; // LH
                4'b1000: reg_wt_data = LoadData; // LW
                4'b0011: reg_wt_data = {24'b0, LoadData[7:0]}; // LBU
                4'b0101: reg_wt_data = {16'b0, LoadData[15:0]}; // LHU
                default: reg_wt_data = 0;
            endcase 
        end
        3'd2: reg_wt_data = PC4_in_WB; // JAL
        3'd3: reg_wt_data = Imm_in_WB; // lui
        3'd4: reg_wt_data = PC_in_WB; // auipc
        default: reg_wt_data = 32'b0;
    endcase      
end
always @ (*) begin
    case (LSwea_in_WB)
        2'b00: LoadData = Data_in;
        2'b01: LoadData = {8'b0, Data_in[31:8]};
        2'b10: LoadData = {16'b0, Data_in[31:16]};
        2'b11: LoadData = {24'b0, Data_in[31:24]};
    endcase
end

endmodule

产生写回数据

  1. IFID 寄存器代码:
module IF_reg_ID( 
    input clk_IFID, //寄存器时钟
    input rst_IFID, //寄存器复位
    input en_IFID, //寄存器使能
    input [31:0] PC_in_IFID, //PC输入
    input [31:0] inst_in_IFID, //指令输入
    output [31:0] PC_out_IFID, //PC输出
    output [31:0] inst_out_IFID //指令输出
); 

Reg PC(.clk(clk_IFID), .rst(rst_IFID), .CE(en_IFID), .D(PC_in_IFID), .Q(PC_out_IFID));
Reg inst(.clk(clk_IFID), .rst(rst_IFID), .CE(en_IFID), .D(inst_in_IFID), .Q(inst_out_IFID));
endmodule

传输指令

  1. IDEX 寄存器代码:
module ID_reg_Ex( 
    input clk_IDEX, //寄存器时
    input rst_IDEX, //寄存器复
    input en_IDEX, //寄存器使
    input [31:0] PC_in_IDEX, //PC输入
    input [4:0] Rd_addr_IDEX, //写目的输入
    input [31:0] Rs1_in_IDEX, //操作1输入
    input [31:0] Rs2_in_IDEX, //操作2输入
    input [31:0] Imm_in_IDEX , //立即数输出
    input ALUSrc_B_in_IDEX , //ALU B输入选择
    input [3:0] ALU_control_in_IDEX, //ALU选择控制
    input [2:0] Branch_in_IDEX, //Beq
    input MemRW_in_IDEX, //存储器读
    input [1:0] Jump_in_IDEX, //Jal
    input [2:0] MemtoReg_in_IDEX, //写回选择
    input RegWrite_in_IDEX, //寄存器堆读写
    input [3:0] WHBU_in_IDEX,
    input [1:0] LSwea_in_IDEX, 
    input [31:0] StoreData_in_IDEX,
    input [3:0] wea_in_IDEX,
    output [31:0] PC_out_IDEX, //PC输出
    output [4:0] Rd_addr_out_IDEX, //目的地址输出
    output [31:0] Rs1_out_IDEX, //操作1输出
    output [31:0] Rs2_out_IDEX, //操作2输出
    output [31:0] Imm_out_IDEX , //立即数
    output ALUSrc_B_out_IDEX , //ALU B选择
    output [3:0] ALU_control_out_IDEX, //ALU控制
    output [2:0] Branch_out_IDEX, //Beq
    output MemRW_out_IDEX, //存储器
    output [1:0] Jump_out_IDEX, //Jal
    output [2:0] MemtoReg_out_IDEX, //写回
    output RegWrite_out_IDEX, //寄存器堆读写
    output [3:0] WHBU_out_IDEX,
    output [1:0] LSwea_out_IDEX,
    output [31:0] StoreData_out_IDEX,
    output [3:0] wea_out_IDEX
); 

Reg PC(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(PC_in_IDEX), .Q(PC_out_IDEX));
Reg Rd_addr(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(Rd_addr_IDEX), .Q(Rd_addr_out_IDEX));
Reg Rs1(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(Rs1_in_IDEX), .Q(Rs1_out_IDEX));
Reg Rs2(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(Rs2_in_IDEX), .Q(Rs2_out_IDEX));
Reg Imm(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(Imm_in_IDEX), .Q(Imm_out_IDEX));
Reg ALUSrc_B(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(ALUSrc_B_in_IDEX), .Q(ALUSrc_B_out_IDEX));
Reg ALU_control(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(ALU_control_in_IDEX), .Q(ALU_control_out_IDEX));
Reg Branch(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(Branch_in_IDEX), .Q(Branch_out_IDEX));
Reg MemRW(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(MemRW_in_IDEX), .Q(MemRW_out_IDEX));
Reg Jump(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(Jump_in_IDEX), .Q(Jump_out_IDEX));
Reg MemtoReg(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(MemtoReg_in_IDEX), .Q(MemtoReg_out_IDEX));
Reg RegWrite(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(RegWrite_in_IDEX), .Q(RegWrite_out_IDEX));
Reg WHBU(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(WHBU_in_IDEX), .Q(WHBU_out_IDEX));
Reg LSwea(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(LSwea_in_IDEX), .Q(LSwea_out_IDEX));
Reg StoreData(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(StoreData_in_IDEX), .Q(StoreData_out_IDEX));
Reg wea(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(wea_in_IDEX), .Q(wea_out_IDEX));

endmodule

传输控制信号

  1. EXMEM 寄存器代码:
module Ex_reg_Mem( 
    input clk_EXMem, //寄存器时
    input rst_EXMem, //寄存器复
    input en_EXMem, //寄存器使
    input [31:0] PC_in_EXMem, //PC输入
    input [31:0] Imm_in_EXMem,
    input [31:0] PC4_in_EXMem, //PC+4输入
    input [4:0] Rd_addr_EXMem, //写目的寄存器地址输入
    input zero_in_EXMem, //zero
    input [31:0] ALU_in_EXMem, //ALU输入
    input [31:0] Rs2_in_EXMem, //操作2输入
    input [2:0] Branch_in_EXMem, //Beq
    input MemRW_in_EXMem, //存储器读
    input [1:0] Jump_in_EXMem, //Jal
    input [2:0] MemtoReg_in_EXMem, //写回
    input RegWrite_in_EXMem, //寄存器堆读写
    input [3:0] WHBU_in_EXMem,
    input [1:0] LSwea_in_EXMem, 
    input [31:0] StoreData_in_EXMem,
    input [3:0] wea_in_EXMem,
    output [31:0] PC_out_EXMem, //PC输出
    output [31:0] Imm_out_EXMem, //立即数输
    output [31:0] PC4_out_EXMem, //PC+4输出
    output [4:0] Rd_addr_out_EXMem, //写目的寄存器输出
    output zero_out_EXMem, //zero
    output [31:0] ALU_out_EXMem, //ALU输出
    output [31:0] Rs2_out_EXMem, //操作2输出
    output [2:0] Branch_out_EXMem, //Beq
    output MemRW_out_EXMem, //存储器读
    output [1:0] Jump_out_EXMem, //Jal
    output [2:0] MemtoReg_out_EXMem, //写回
    output RegWrite_out_EXMem, //寄存器堆读写
    output [3:0] WHBU_out_EXMem,
    output [1:0] LSwea_out_EXMem,
    output [31:0] StoreData_out_EXMem,
    output [3:0] wea_out_EXMem
); 

Reg PC(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(PC_in_EXMem), .Q(PC_out_EXMem));
Reg Imm(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(Imm_in_EXMem), .Q(Imm_out_EXMem));
Reg PC4(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(PC4_in_EXMem), .Q(PC4_out_EXMem));
Reg Rd_addr(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(Rd_addr_EXMem), .Q(Rd_addr_out_EXMem));
Reg zero(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(zero_in_EXMem), .Q(zero_out_EXMem));
Reg ALU(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(ALU_in_EXMem), .Q(ALU_out_EXMem));
Reg Rs2(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(Rs2_in_EXMem), .Q(Rs2_out_EXMem));
Reg Branch(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(Branch_in_EXMem), .Q(Branch_out_EXMem));
Reg MemRW(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(MemRW_in_EXMem), .Q(MemRW_out_EXMem));
Reg Jump(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(Jump_in_EXMem), .Q(Jump_out_EXMem));
Reg MemtoReg(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(MemtoReg_in_EXMem), .Q(MemtoReg_out_EXMem));
Reg RegWrite(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(RegWrite_in_EXMem), .Q(RegWrite_out_EXMem));
Reg WHBU(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(WHBU_in_EXMem), .Q(WHBU_out_EXMem));
Reg LSwea(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(LSwea_in_EXMem), .Q(LSwea_out_EXMem));
Reg StoreData(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(StoreData_in_EXMem), .Q(StoreData_out_EXMem));
Reg wea(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(wea_in_EXMem), .Q(wea_out_EXMem));

endmodule

传输计算值

  1. MEMWB 寄存器代码:
module Mem_reg_WB( 
    input clk_MemWB, //寄存器时
    input rst_MemWB, //寄存器复
    input en_MemWB, //寄存器使
    input [31:0] PC_in_MemWB,
    input [31:0] Imm_in_MemWB,
    input [31:0] PC4_in_MemWB, //PC+4输入
    input [4:0] Rd_addr_MemWB, //写目的输入
    input [31:0] ALU_in_MemWB, //ALU输入
    input [31:0] Dmem_data_MemWB, //存储器数据
    input [2:0] MemtoReg_in_MemWB, //写回
    input RegWrite_in_MemWB, //寄存器堆读写
    input [3:0] WHBU_in_MemWB,
    input [1:0] LSwea_in_MemWB,
    output [31:0] PC_out_MemWB, //PC输出
    output [31:0] Imm_out_MemWB,
    output [31:0] PC4_out_MemWB, //PC+4输出
    output [4:0] Rd_addr_out_MemWB, //写目的输出
    output [31:0] ALU_out_MemWB, //ALU输出
    output [31:0] Dmem_data_out_MemWB, //存储器数据
    output [2:0] MemtoReg_out_MemWB, //写回
    output RegWrite_out_MemWB, //寄存器堆读写
    output [3:0] WHBU_out_MemWB,
    output [1:0] LSwea_out_MemWB
);

Reg PC(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(PC_in_MemWB), .Q(PC_out_MemWB));
Reg Imm(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(Imm_in_MemWB), .Q(Imm_out_MemWB));
Reg PC4(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(PC4_in_MemWB), .Q(PC4_out_MemWB));
Reg Rd_addr(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(Rd_addr_MemWB), .Q(Rd_addr_out_MemWB));
Reg ALU(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(ALU_in_MemWB), .Q(ALU_out_MemWB));
Reg Dmem_data(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(Dmem_data_MemWB), .Q(Dmem_data_out_MemWB));
Reg MemtoReg(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(MemtoReg_in_MemWB), .Q(MemtoReg_out_MemWB));
Reg RegWrite(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(RegWrite_in_MemWB), .Q(RegWrite_out_MemWB));
Reg WHBU(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(WHBU_in_MemWB), .Q(WHBU_out_MemWB));
Reg LSwea(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(LSwea_in_MemWB), .Q(LSwea_out_MemWB));
endmodule

仿真关键步骤说明

  1. 仿真汇编:

仿真代码为将 Lab4-3 仿真代码每句指令中间插入 \(3\)nop 指令得来

最终运行结果为 x31 寄存器值 666

  1. testbench 模块:
module testbench(
    input wire clk,
    input wire rst
);
    wire [31:0] Addr_out;
    wire [31:0] Data_out;       
    wire        CPU_MIO;
    wire        MemRW_Mem;
    wire [3:0]  memrw;
    wire [3:0] MEMRW;
    wire [31:0] PC_out_IF;
    wire [31:0] douta;
    wire [31:0] spo;
    wire [3:0] wea;
assign memrw = {MemRW_Mem, MemRW_Mem, MemRW_Mem, MemRW_Mem};
assign MEMRW = memrw & wea;
Pipeline_CPU u0(
         .clk(clk),
         .rst(rst),
         .Data_in(douta),
         .inst_IF(spo), 
         .Addr_out(Addr_out),
         .Data_out(Data_out), 
         .PC_out_IF(PC_out_IF), 
         .MemRW_Mem(MemRW_Mem),
         .wea(wea)  
    );

    RAM_B u1(
        .clka(~clk),
        .wea(MEMRW),
        .addra(Addr_out[11:2]),
        .dina(Data_out),
        .douta(douta)
    );

    ROM_D u2(
        .a(PC_out_IF[11:2]),
        .spo(spo)
    );

endmodule

实例化了流水线 CPU, RAM, ROM 模块, 并进行了连线

  1. 仿真代码:
module sim();
    reg clk;
    reg rst;
    testbench m0(.clk(clk), .rst(rst));
    initial begin
        clk = 1'b0;
        rst = 1'b1;
        #50;
        rst = 1'b0;
    end
    always #10 clk = ~clk;
endmodule

在仿真顶层代码中, 实例化一个 testbench 模块并将时钟,复位信号传入模块

实验结果与分析

仿真结果

alt text

  1. 初始时所有寄存器值为 0

    然后流水线 CPU 正常运行, 与 Lab 4-3 中的中间运行结果保持一致

alt text

  1. 最后寄存器 x31 = 666, 说明仿真成功

下板结果

Lab 5-2

操作方法与实验步骤

代码设计及说明

Stall

在流水线 CPU 产生冲突时, 可以通过暂停 IF, ID 阶段, 向 EX 阶段插入 nop 指令来实现 stall.

对于 Data Hazard, 如果 当 EX, MEM, WB 阶段的 rdID 阶段的 rs 相同时, 说明产生冲突, 就需要 stall 一个周期

对于 Control Hazard, 如果 EX, MEM 阶段的 Branch, Jump 信号为 \(1\), 那么说明是跳转指令, 就需要 stall 一个周期

Structural Hazard

在寄存器读写时, 如果在上升沿进行写操作, 那么在写入的那一周期无法同时将写入的值读出, 导致需要多 stall 一个周期

因此, 可以改为在时钟下降沿写, 于是可以提前半个周期把值写入, 就可以在下一个整周期读出写入的值, 因此可以只 stall 两个周期解决 data hazard

Data Hazard

在处理数据冲突时, 我采用了 Forwarding 的处理方式

MEM, WB 阶段的 rdID 阶段的 rs 相同时, 说明产生冲突, 那么就可以将 rd 的值旁路传回 EX 阶段, 并控制 ALU 直接选择 rs 为 Forwarding 后的值

同时, 为了处理 Load-Use 情况, 需要 stall 控制模块在 EX 阶段进行一次 stall, 之后再进行 Forwarding 即可

同时, 处理 Use-Store 情况, 需要将 StoreData 的产生从 ID 阶段挪到 EX 阶段, 然后选择 Forwarding 后的数据作为 StoreData 即可

最后是 Lui-Use 情况, 由于在我实现的 lui 指令, 产生的立即数 ImmOut 不经过 ALU, 在传到 WB 阶段时才能够写回

所以为了将 WB 阶段的 lui 的立即数旁路传回 EX 阶段, 需要将 lui 指令在 EX 阶段多 stall 一个周期, 这样就可以将 WB 阶段的立即数值传回 EX 阶段, 完成Forwarding

Forwarding

控制产生 Forwarding 信号

如果 MEM 阶段的 rdID 阶段的 rs 相同时, 那么控制产生 Forward = 01

如果 WB 阶段的 rdID 阶段的 rs 相同时, 那么控制产生 Forward = 10

Control Hazard

在处理控制冲突时, 我只采用了 stall 的方式, 不过写完才发现是和 Branch Always Not Taken 结合后的奇妙版本

当跳转指令处于 EX 阶段时, 直接 stall 一个周期; 当指令处于 MEM 阶段时, 由于 PCSrc 也在此阶段产生, 所以我判断了 PCSrc 的值来决定是否 flush 掉 IF, ID 阶段寄存器的值

如果 taken, 那么 flush 掉 IF, ID 阶段, 相当于 stall 了 \(3\) 个周期; 如果 not taken, 则流水线继续运行, 相当于只在 EX 阶段 stall 了 \(1\) 个周期

源代码

  1. RegFile 模块:
  always @(negedge clk /*double bump*/ or posedge rst) begin
        if (rst) begin
            for (i = 0; i < 32; i = i + 1) begin
                Reg[i] <= 32'b0;
            end
        end
        else begin
            if (RegWrite && (Wt_addr != 5'b0)) begin
                Reg[Wt_addr] <= Wt_data;
            end
            else begin
                Reg[Wt_addr] <= Reg[Wt_addr];
            end

        end
    end
    assign Rs1_data = Reg[Rs1_addr];
    assign Rs2_data = Reg[Rs2_addr];

将寄存器改为下降沿写

  1. Stall 模块:
module stall_control (
    input wire [4:0] ID_rs1,         // 当前指令的源操作数 rs1
    input wire [4:0] ID_rs2,         // 当前指令的源操作数 rs2
    input wire [4:0] IDEX_rd,      // ID 阶段的目标寄存器 // rd_out
    input wire [4:0] EXMem_rd,     // EX 阶段的目标寄存器 // rd_out
    input wire [4:0] MemWB_rd,     // MEM 阶段的目标寄存器 // rd_out
    input wire MemWB_RegWrite,    //RegWrite_out  
    input wire EXMem_RegWrite,    //RegWrite_out  
    input wire IDEX_RegWrite,   //RegWrite_out
    input wire [2:0] IDEX_MemtoReg,     //MemtoReg_out
    input wire [2:0] Mem_PCSrc,  
    input wire [2:0] IDEX_Branch,
    input wire [2:0] EXMem_Branch,
    input wire [1:0] IDEX_Jump,
    input wire [1:0] EXMem_Jump,
    output reg PC_en, // PC 写使能信号
    output reg IFID_en, // IF/ID 写使能信号
    output reg IDEX_Bubble, // ID/EX 阶段插入 bubble
    output reg IFID_flush,
    output reg IDEX_flush
);
    wire stall, DataHazard, ControlHazard;

    // with forwarding
    assign DataHazard = ( 
        (IDEX_RegWrite == 1'b1 && IDEX_MemtoReg == 3'b1 && (IDEX_rd != 5'b0) && 
        ((ID_rs1 == IDEX_rd) || (ID_rs2 == IDEX_rd))) || // Load-Use Hazard
        (IDEX_RegWrite == 1'b1 && IDEX_MemtoReg == 3'b11 && (IDEX_rd != 5'b0) && 
        ((ID_rs1 == IDEX_rd) || (ID_rs2 == IDEX_rd))) // Lui-Use Hazard
    );

    // without forwarding

    // assign DataHazard = ( 
    //     (IDEX_RegWrite && (IDEX_rd != 5'b0) && 
    //     ((ID_rs1 == IDEX_rd) || (ID_rs2 == IDEX_rd))) || 
    //     (EXMem_RegWrite && (EXMem_rd != 5'b0) && 
    //     ((ID_rs1 == EXMem_rd) || (ID_rs2 == EXMem_rd)))
    //     // || 
    //     // (MemWB_RegWrite && (MemWB_rd != 5'b0) && 
    //     // ((ID_rs1 == MemWB_rd) || (ID_rs2 == MemWB_rd)))
    // );

    assign ControlHazard = ((IDEX_Branch != 3'b0) || (EXMem_Branch != 3'b0) || 
        (IDEX_Jump != 2'b0) || (EXMem_Jump != 2'b0)); 

    assign stall = (DataHazard || ControlHazard);

    always @(*) begin 
        if (stall) begin
            if (EXMem_Branch != 3'b0 || EXMem_Jump != 2'b0) begin
                PC_en = 1;
                IFID_en = 1;
                IDEX_Bubble = 0;
                if(Mem_PCSrc != 3'b0) begin
                    IFID_flush = 1;
                    IDEX_flush = 1;
                end
                else begin
                    IFID_flush = 0;
                    IDEX_flush = 0;
                end
            end
            else begin
                PC_en = 0;
                IFID_en = 0;
                IFID_flush = 0;
                IDEX_flush = 0;
                IDEX_Bubble = 1;
            end
        end
        else begin
            PC_en = 1;        
            IFID_en = 1;     
            IFID_flush = 0;
            IDEX_flush = 0;
            IDEX_Bubble = 0;       
        end
    end
endmodule
  1. Forwarding 模块:
    module ForwardingUnit(
        input [4:0] EXMem_Rd,
        input [4:0] MemWB_Rd,
        input [4:0] IDEX_Rs1,
        input [4:0] IDEX_Rs2,
        input       EXMem_RegWrite,
        input       MemWB_RegWrite,
        output reg [1:0] ForwardA,
        output reg [1:0] ForwardB
    );
    
    wire EXMemForwardA, EXMemForwardB, MemWBForwardA, MemWBForwardB;
    assign EXMemForwardA = EXMem_RegWrite && (EXMem_Rd != 0) && (EXMem_Rd == IDEX_Rs1);
    assign EXMemForwardB = EXMem_RegWrite && (EXMem_Rd != 0) && (EXMem_Rd == IDEX_Rs2);
    assign MemWBForwardA = MemWB_RegWrite && (MemWB_Rd != 0) && (MemWB_Rd == IDEX_Rs1);
    assign MemWBForwardB = MemWB_RegWrite && (MemWB_Rd != 0) && (MemWB_Rd == IDEX_Rs2);
    
    always @(*) begin
        // Forwarding for Rs1
        if (EXMemForwardA) begin
            ForwardA = 2'b10; // Forward from EX/MEM
        end else if (MemWBForwardA && (~EXMemForwardA)) begin
            ForwardA = 2'b01; // Forward from MEM/WB
        end else begin
            ForwardA = 2'b00; // No forwarding
        end
    
        // Forwarding for Rs2
        if (EXMemForwardB) begin
            ForwardB = 2'b10; // Forward from EX/MEM
        end else if (MemWBForwardB && (~EXMemForwardB)) begin
            ForwardB = 2'b01; // Forward from MEM/WB
        end else begin
            ForwardB = 2'b00; // No forwarding
        end
    end
    
    endmodule
    
  2. 改过的 EX 阶段代码:
module Pipeline_Ex( 
    input [31:0] PC_in_EX, //PC输入
    input [31:0] Rs1_in_EX, //操作1输入
    input [31:0] Rs2_in_EX, //操作2输入
    input [31:0] Imm_in_EX, //立即数
    input [3:0] WHBU_in_EX,
    input [1:0] LSwea_in_EX,
    input ALUSrc_B_in_EX, //ALU B选择
    input [3:0] ALU_control_in_EX, //ALU选择控制
    input [1:0] ForwardA, 
    input [1:0] ForwardB, 
    input [31:0] ALU_out_EXMem,
    input [31:0] Data_out_WB,
    output [31:0] PC_out_EX, //PC输出, PCAddImm
    output [31:0] PC4_out_EX, //PC+4输出
    output zero_out_EX, //ALU0输出
    output [31:0] ALU_out_EX, //ALU计算输出
    output [31:0] Rs2_out_EX, //操作2输出
    output reg [31:0] StoreData_out_EX, //存储数据输出
    output reg [3:0] wea_out_EX //存储控制输出
);
wire [31:0] ALUB;
assign PC_out_EX = PC_in_EX + Imm_in_EX; // PC_out = PC + Imm
assign PC4_out_EX = PC_in_EX + 32'd4;


reg [31:0] ForwardA_data, ForwardB_data;

always @(*) begin
    case (ForwardA)
        2'b00: ForwardA_data = Rs1_in_EX;
        2'b01: ForwardA_data = Data_out_WB; // Load data / previous ALU result
        2'b10: ForwardA_data = ALU_out_EXMem; // ALU result
        default: ForwardA_data = 32'b0;
    endcase
end

always @(*) begin
    case (ForwardB)
        2'b00: ForwardB_data = Rs2_in_EX;
        2'b01: ForwardB_data = Data_out_WB;
        2'b10: ForwardB_data = ALU_out_EXMem;
        default: ForwardB_data = 32'b0;
    endcase
end

assign Rs2_out_EX = ForwardB_data;

assign ALUB = (ALUSrc_B_in_EX) ? Imm_in_EX : ForwardB_data;
ALU alu(.A(ForwardA_data), .B(ALUB), .ALU_operation(ALU_control_in_EX), .res(ALU_out_EX), 
    .zero(zero_out_EX));

always @ (*) begin // 改在 Ex 阶段产生 StoreData
    case (LSwea_in_EX)
        2'b00: begin
            StoreData_out_EX = Rs2_out_EX;
            case (WHBU_in_EX)
                4'b0010: wea_out_EX = 4'b0001;
                4'b0100: wea_out_EX = 4'b0011;
                4'b1000: wea_out_EX = 4'b1111;
                default: wea_out_EX = 0;
            endcase
        end
        2'b01: begin
            StoreData_out_EX = {Rs2_out_EX[23:0], 8'b0};
            case (WHBU_in_EX)
                4'b0010: wea_out_EX = 4'b0010;
                4'b0100: wea_out_EX = 4'b0110;
                default: wea_out_EX = 0;
            endcase
        end
        2'b10: begin
            StoreData_out_EX = {Rs2_out_EX[15:0], 16'b0};
            case (WHBU_in_EX)
                4'b0010: wea_out_EX = 4'b0100;
                4'b0100: wea_out_EX = 4'b1100;
                default: wea_out_EX = 0;
            endcase
        end
        2'b11: begin
            StoreData_out_EX = {Rs2_out_EX[23:0], 24'b0};
            case (WHBU_in_EX)
                4'b0010: wea_out_EX = 4'b1000;
                default: wea_out_EX = 0;
            endcase
        end
    endcase
end    

endmodule

改为在 EX 阶段产生 StoreData, 并且接入了 ALU_out_EXMemData_out_WB 作为 Forwarding 的选择值

  1. 整体 CPU 代码:
    module Pipeline_CPU(
        input clk,
        input rst,
        input [31:0] Data_in,  // 数据输入
        input [31:0] inst_IF,  // 指令输入
        output [31:0] Addr_out,  // 地址输出 
        output [31:0] Data_out,  // 数据输出 
        output [31:0] Data_out_WB,  // 写回数据输出
        output [31:0] PC_out_IF,  // IF阶段PC输出 
        output [31:0] inst_ID,  // ID阶段指令输出
        output [31:0] PC_out_ID,  // ID阶段PC输出
        output [31:0] PC_out_EX,  // EX阶段PC输出 
        output MemRW_EX,  // EX阶段存储器读写
        output MemRW_Mem,  // MEM阶段存储器读写
        output [3:0] wea,
        output [31:0] Reg00, output [31:0] Reg01,output [31:0] Reg02,output [31:0] Reg03,
        output [31:0] Reg04, output [31:0] Reg05,output [31:0] Reg06,output [31:0] Reg07,
        output [31:0] Reg08, output [31:0] Reg09,output [31:0] Reg10,output [31:0] Reg11,
        output [31:0] Reg12, output [31:0] Reg13,output [31:0] Reg14,output [31:0] Reg15,
        output [31:0] Reg16, output [31:0] Reg17,output [31:0] Reg18,output [31:0] Reg19,
        output [31:0] Reg20, output [31:0] Reg21,output [31:0] Reg22,output [31:0] Reg23,
        output [31:0] Reg24, output [31:0] Reg25,output [31:0] Reg26,output [31:0] Reg27,
        output [31:0] Reg28, output [31:0] Reg29,output [31:0] Reg30,output [31:0] Reg31
    
    );
        wire [31:0] PC_out_EXMem, ALU_out_EXMem;
        wire [4:0] Rd_addr_out_MemWB;
        wire RegWrite_out_MemWB;
        wire [2:0] PCSrc;
        wire PC_en, IFID_en, IDEX_Bubble;
        Pipeline_IF PPLIF (
            .clk_IF(clk),
            .rst_IF(rst),
            .en_IF(PC_en),
            .PC_in_IF(PC_out_EXMem),
            .PCSrc(PCSrc),
            .ALU_in_IF(ALU_out_EXMem),
            .PC_out_IF(PC_out_IF)
    
        );
        wire [31:0] PC_out_IFID, inst_out_IFID;
    
        wire IFID_flush, IDEX_flush;
        IF_reg_ID PPLIFID (
            .clk_IFID(clk),
            .rst_IFID(rst),
            .flush(IFID_flush),
            .en_IFID(IFID_en), 
            .PC_in_IFID(PC_out_IF),
            .inst_in_IFID(inst_IF),
            .PC_out_IFID(PC_out_IFID),
            .inst_out_IFID(inst_out_IFID)
        );
    
    
        wire [31:0] Imm_out_ID, Rs1_out_ID, Rs2_out_ID;
        wire ALUSrc_B_ID, MemRW_ID, RegWrite_out_ID;
        wire [4:0] Rd_addr_out_ID;
        wire [3:0] ALU_control_ID;
        wire [2:0] Branch_ID;
        wire [1:0] Jump_ID;
        wire [2:0] MemtoReg_ID;
        wire [3:0] WHBU_ID;
        wire [1:0] LSwea_ID;
        wire [31:0] StoreData_ID;
        wire [3:0] wea_ID;
    
    
        Pipeline_ID PPLID (
            .clk_ID(clk),
            .rst_ID(rst),
            .RegWrite_in_ID(RegWrite_out_MemWB), 
            .Rd_addr_ID(Rd_addr_out_MemWB), 
            .Wt_data_ID(Data_out_WB), 
            .inst_in_ID(inst_out_IFID), 
            .Rd_addr_out_ID(Rd_addr_out_ID),
            .Rs1_out_ID(Rs1_out_ID), 
            .Rs2_out_ID(Rs2_out_ID), 
            .Imm_out_ID(Imm_out_ID), 
            .ALUSrc_B_ID(ALUSrc_B_ID), 
            .ALU_control_ID(ALU_control_ID),
            .Branch_ID(Branch_ID), 
            .MemRW_ID(MemRW_ID),
            .Jump_ID(Jump_ID), 
            .MemtoReg_ID(MemtoReg_ID), 
            .RegWrite_out_ID(RegWrite_out_ID),
            .Rs1_addr_out_ID(Rs1_addr_out_ID),
            .Rs2_addr_out_ID(Rs2_addr_out_ID),
            .WHBU_ID(WHBU_ID),
            .LSwea_ID(LSwea_ID),
            // .StoreData_ID(StoreData_ID),
            // .wea_ID(wea_ID),
            .Reg00(Reg00),.Reg01(Reg01),.Reg02(Reg02),.Reg03(Reg03),
            .Reg04(Reg04),.Reg05(Reg05),.Reg06(Reg06),.Reg07(Reg07),
            .Reg08(Reg08),.Reg09(Reg09),.Reg10(Reg10),.Reg11(Reg11),
            .Reg12(Reg12),.Reg13(Reg13),.Reg14(Reg14),.Reg15(Reg15),
            .Reg16(Reg16),.Reg17(Reg17),.Reg18(Reg18),.Reg19(Reg19),
            .Reg20(Reg20),.Reg21(Reg21),.Reg22(Reg22),.Reg23(Reg23),
            .Reg24(Reg24),.Reg25(Reg25),.Reg26(Reg26),.Reg27(Reg27),
            .Reg28(Reg28),.Reg29(Reg29),.Reg30(Reg30),.Reg31(Reg31)
        );
        wire [31:0] PC_out_IDEX, Imm_out_IDEX, Rs1_out_IDEX, Rs2_out_IDEX; 
        wire [4:0]  Rd_addr_out_IDEX, Rs1_addr_out_ID, Rs2_addr_out_ID;
        wire ALUSrc_B_out_IDEX, MemRW_out_IDEX, RegWrite_out_IDEX;
    
        wire [3:0] ALU_control_out_IDEX;
        wire [2:0] Branch_out_IDEX;
        wire [1:0] Jump_out_IDEX;
        wire [2:0] MemtoReg_out_IDEX;
        wire [3:0] WHBU_out_IDEX;
        wire [1:0] LSwea_out_IDEX;
        wire [31:0] StoreData_out_IDEX;
        wire [4:0] Rs1_addr_out_IDEX, Rs2_addr_out_IDEX;
        wire [3:0] wea_out_IDEX;
    
    
        ID_reg_Ex PPLIDEx (
            .clk_IDEX(clk),
            .rst_IDEX(rst),
            .en_IDEX(1'b1),
            .Bubble(IDEX_Bubble),
            .flush(IDEX_flush),
            .PC_in_IDEX(PC_out_IFID),
            .Rd_addr_IDEX(Rd_addr_out_ID),
            .Rs1_addr_IDEX(Rs1_addr_out_ID),
            .Rs2_addr_IDEX(Rs2_addr_out_ID),
            .Rs1_in_IDEX(Rs1_out_ID),
            .Rs2_in_IDEX(Rs2_out_ID),
            .Imm_in_IDEX(Imm_out_ID),
            .ALUSrc_B_in_IDEX(ALUSrc_B_ID),
            .ALU_control_in_IDEX(ALU_control_ID),
            .Branch_in_IDEX(Branch_ID),
            .Jump_in_IDEX(Jump_ID),
            .MemRW_in_IDEX(MemRW_ID),
            .MemtoReg_in_IDEX(MemtoReg_ID),
            .RegWrite_in_IDEX(RegWrite_out_ID),
            .WHBU_in_IDEX(WHBU_ID),
            .LSwea_in_IDEX(LSwea_ID),
            // .StoreData_in_IDEX(StoreData_ID),
            // .wea_in_IDEX(wea_ID),
            .PC_out_IDEX(PC_out_IDEX),
            .Rd_addr_out_IDEX(Rd_addr_out_IDEX),
            .Rs1_addr_out_IDEX(Rs1_addr_out_IDEX),
            .Rs2_addr_out_IDEX(Rs2_addr_out_IDEX),
            .Rs1_out_IDEX(Rs1_out_IDEX),
            .Rs2_out_IDEX(Rs2_out_IDEX),
            .Imm_out_IDEX(Imm_out_IDEX),
            .ALUSrc_B_out_IDEX(ALUSrc_B_out_IDEX),
            .ALU_control_out_IDEX(ALU_control_out_IDEX),
            .Branch_out_IDEX(Branch_out_IDEX),
            .Jump_out_IDEX(Jump_out_IDEX),
            .MemRW_out_IDEX(MemRW_out_IDEX),
            .MemtoReg_out_IDEX(MemtoReg_out_IDEX),
            .RegWrite_out_IDEX(RegWrite_out_IDEX),
            .WHBU_out_IDEX(WHBU_out_IDEX),
            .LSwea_out_IDEX(LSwea_out_IDEX)
            // .StoreData_out_IDEX(StoreData_out_IDEX),
            // .wea_out_IDEX(wea_out_IDEX)
        );
        wire [31:0] PC4_out_EX, ALU_out_EX;
        wire [31:0] Rs2_out_EX;
        wire zero_out_EX;
        wire [1:0] ForwardA, ForwardB;
    
    
        // 新增 Forwarding 模块
        ForwardingUnit PPLForwarding (
            .IDEX_Rs1(Rs1_addr_out_IDEX),
            .IDEX_Rs2(Rs2_addr_out_IDEX),
            .EXMem_Rd(Rd_addr_out_EXMem),
            .MemWB_Rd(Rd_addr_out_MemWB),
            .EXMem_RegWrite(RegWrite_out_EXMem),
            .MemWB_RegWrite(RegWrite_out_MemWB),
            .ForwardA(ForwardA),
            .ForwardB(ForwardB)
        );
    
        wire [31:0] StoreData_out_EX;
        wire [3:0] wea_out_EX;
        // 改为在 Ex 产生 StoreData
        Pipeline_Ex PPLEx (
            .PC_in_EX(PC_out_IDEX),
            .Rs1_in_EX(Rs1_out_IDEX),
            .Rs2_in_EX(Rs2_out_IDEX),
            .Imm_in_EX(Imm_out_IDEX),
            .WHBU_in_EX(WHBU_out_IDEX),
            .LSwea_in_EX(LSwea_out_IDEX),
            .ALUSrc_B_in_EX(ALUSrc_B_out_IDEX),
            .ALU_control_in_EX(ALU_control_out_IDEX),
            .ForwardA(ForwardA),
            .ForwardB(ForwardB),
            .ALU_out_EXMem(ALU_out_EXMem),
            .Data_out_WB(Data_out_WB),
            .PC_out_EX(PC_out_EX),
            .PC4_out_EX(PC4_out_EX),
            .zero_out_EX(zero_out_EX),
            .ALU_out_EX(ALU_out_EX),
            .Rs2_out_EX(Rs2_out_EX),
            .StoreData_out_EX(StoreData_out_EX),
            .wea_out_EX(wea_out_EX)
        );
    
        wire [31:0] PC4_out_EXMem, Imm_out_EXMem, Rs2_out_EXMem;
        wire [4:0] Rd_addr_out_EXMem;
        wire zero_out_EXMem, MemRW_out_EXMem, RegWrite_out_EXMem;
        wire [2:0] Branch_out_EXMem;
        wire [1:0] Jump_out_EXMem;
        wire [2:0] MemtoReg_out_EXMem;
        wire [3:0] WHBU_out_EXMem;
        wire [1:0] LSwea_out_EXMem;
        wire [31:0] StoreData_out_EXMem;
        wire [3:0] wea_out_EXMem;
    
        Ex_reg_Mem PPLExMem (
            .clk_EXMem(clk),
            .rst_EXMem(rst),
            .en_EXMem(1'b1),
            .PC_in_EXMem(PC_out_EX),
            .Imm_in_EXMem(Imm_out_IDEX),
            .PC4_in_EXMem(PC4_out_EX),
            .Rd_addr_EXMem(Rd_addr_out_IDEX),
            .zero_in_EXMem(zero_out_EX),
            .ALU_in_EXMem(ALU_out_EX),
            .Rs2_in_EXMem(Rs2_out_EX),
            .Branch_in_EXMem(Branch_out_IDEX),
            .Jump_in_EXMem(Jump_out_IDEX),
            .MemRW_in_EXMem(MemRW_out_IDEX),
            .MemtoReg_in_EXMem(MemtoReg_out_IDEX),
            .RegWrite_in_EXMem(RegWrite_out_IDEX),
            .WHBU_in_EXMem(WHBU_out_IDEX),
            .LSwea_in_EXMem(LSwea_out_IDEX),
            .StoreData_in_EXMem(StoreData_out_EX),
            .wea_in_EXMem(wea_out_EX),
            .PC_out_EXMem(PC_out_EXMem),
            .Imm_out_EXMem(Imm_out_EXMem),
            .PC4_out_EXMem(PC4_out_EXMem),
            .Rd_addr_out_EXMem(Rd_addr_out_EXMem),
            .zero_out_EXMem(zero_out_EXMem),
            .ALU_out_EXMem(ALU_out_EXMem),
    
            .Rs2_out_EXMem(Rs2_out_EXMem),
            .Branch_out_EXMem(Branch_out_EXMem),
            .Jump_out_EXMem(Jump_out_EXMem),
            .MemRW_out_EXMem(MemRW_out_EXMem),
            .MemtoReg_out_EXMem(MemtoReg_out_EXMem),
            .RegWrite_out_EXMem(RegWrite_out_EXMem),
            .WHBU_out_EXMem(WHBU_out_EXMem),
            .LSwea_out_EXMem(LSwea_out_EXMem),
            .StoreData_out_EXMem(StoreData_out_EXMem),
            .wea_out_EXMem(wea_out_EXMem)
        );
        Pipeline_Mem PPLMem (
            .zero_in_Mem(zero_out_EXMem),
            .res_in_Mem(ALU_out_EXMem),
            .Branch_in_Mem(Branch_out_EXMem),
            .Jump_in_Mem(Jump_out_EXMem),
            .PCSrc(PCSrc)
        );
        wire [31:0] PC_out_MemWB, Imm_out_MemWB, PC4_out_MemWB, 
            ALU_out_MemWB, Dmem_data_out_MemWB;
    
        wire [2:0] MemtoReg_out_MemWB;
        //wire RegWrite_out_MemWB;
        wire [3:0] WHBU_out_MemWB;
        wire [1:0] LSwea_out_MemWB;
    
        Mem_reg_WB PPLMemWB (
            .clk_MemWB(clk),
            .rst_MemWB(rst),
            .en_MemWB(1'b1),
            .PC_in_MemWB(PC_out_EXMem),
            .Imm_in_MemWB(Imm_out_EXMem),
            .PC4_in_MemWB(PC4_out_EXMem),
            .Rd_addr_MemWB(Rd_addr_out_EXMem),
            .ALU_in_MemWB(ALU_out_EXMem),
            .Dmem_data_MemWB(Data_in),
            .MemtoReg_in_MemWB(MemtoReg_out_EXMem),
            .RegWrite_in_MemWB(RegWrite_out_EXMem),
            .WHBU_in_MemWB(WHBU_out_EXMem),
            .LSwea_in_MemWB(LSwea_out_EXMem),
            .PC_out_MemWB(PC_out_MemWB),
            .Imm_out_MemWB(Imm_out_MemWB),
            .PC4_out_MemWB(PC4_out_MemWB),
            .Rd_addr_out_MemWB(Rd_addr_out_MemWB),
            .ALU_out_MemWB(ALU_out_MemWB),
            .Dmem_data_out_MemWB(Dmem_data_out_MemWB),
            .MemtoReg_out_MemWB(MemtoReg_out_MemWB),
            .RegWrite_out_MemWB(RegWrite_out_MemWB),
            .WHBU_out_MemWB(WHBU_out_MemWB),
            .LSwea_out_MemWB(LSwea_out_MemWB)
        );
    
        Pipeline_WB PPLWB (
            .PC4_in_WB(PC4_out_MemWB),
            .ALU_in_WB(ALU_out_MemWB),
            .Dmem_data_WB(Dmem_data_out_MemWB),
            .Imm_in_WB(Imm_out_MemWB),
            .PC_in_WB(PC_out_MemWB),
            .MemtoReg_in_WB(MemtoReg_out_MemWB),
            .WHBU_in_WB(WHBU_out_MemWB),
            .LSwea_in_WB(LSwea_out_MemWB),
            .Data_out_WB(Data_out_WB)
        );
        assign MemRW_EX = MemRW_out_IDEX;
        assign MemRW_Mem = MemRW_out_EXMem;
        assign Addr_out = ALU_out_EXMem;
        assign Data_out = StoreData_out_EXMem;
        assign wea = wea_out_EXMem;
        assign PC_out_ID = PC_out_IFID;
        assign inst_ID = inst_out_IFID;
        //新增 stall 模块
        stall_control sc(.ID_rs1(Rs1_addr_out_ID), .ID_rs2(Rs2_addr_out_ID), 
            .IDEX_rd(Rd_addr_out_IDEX), .EXMem_rd(Rd_addr_out_EXMem), 
            .MemWB_rd(Rd_addr_out_MemWB), 
            .EXMem_RegWrite(RegWrite_out_EXMem), .IDEX_RegWrite(RegWrite_out_IDEX), 
            .MemWB_RegWrite(RegWrite_out_MemWB), .IDEX_MemtoReg(MemtoReg_out_IDEX),
            .IDEX_Branch(Branch_out_IDEX), .EXMem_Branch(Branch_out_EXMem), 
            .IDEX_Jump(Jump_out_IDEX), .EXMem_Jump(Jump_out_EXMem), 
            .Mem_PCSrc(PCSrc),
            .PC_en(PC_en), .IFID_en(IFID_en), .IDEX_Bubble(IDEX_Bubble), 
            .IFID_flush(IFID_flush), .IDEX_flush(IDEX_flush)
        );
    endmodule
    

仿真关键步骤说明

  1. 仿真汇编
    auipc x1, 0
    j     start            # 00
dummy:
    nop                    # 04
    nop                    # 08
    nop                    # 0C
    nop                    # 10
    nop                    # 14
    nop                    # 18
    nop                    # 1C
    j     dummy

start:
    bnez  x1, dummy
    beq   x0, x0, pass_0
    li    x31, 0
    auipc x30, 0
    j     dummy
pass_0:
    li    x31, 1
    bne   x0, x0, dummy
    bltu  x0, x0, dummy
    li    x1, -1           # x1=FFFFFFFF
    xori  x3, x1, 1        # x3=FFFFFFFE
    add   x3, x3, x3       # x3=FFFFFFFC
    add   x3, x3, x3       # x3=FFFFFFF8
    add   x3, x3, x3       # x3=FFFFFFF0    
    add   x3, x3, x3       # x3=FFFFFFE0
    add   x3, x3, x3       # x3=FFFFFFC0
    add   x3, x3, x3       # x3=FFFFFF80
    add   x3, x3, x3       # x3=FFFFFF00
    add   x3, x3, x3       # x3=FFFFFE00
    add   x3, x3, x3       # x3=FFFFFC00
    add   x3, x3, x3       # x3=FFFFF800
    add   x3, x3, x3       # x3=FFFFF000
    add   x3, x3, x3       # x3=FFFFE000
    add   x3, x3, x3       # x3=FFFFC000
    add   x3, x3, x3       # x3=FFFF8000
    add   x3, x3, x3       # x3=FFFF0000
    add   x3, x3, x3       # x3=FFFE0000
    add   x3, x3, x3       # x3=FFFC0000
    add   x3, x3, x3       # x3=FFF80000
    add   x3, x3, x3       # x3=FFF00000
    add   x3, x3, x3       # x3=FFE00000
    add   x3, x3, x3       # x3=FFC00000
    add   x3, x3, x3       # x3=FF800000
    add   x3, x3, x3       # x3=FF000000
    add   x3, x3, x3       # x3=FE000000
    add   x3, x3, x3       # x3=FC000000
    add   x5, x3, x3       # x5=F8000000
    add   x3, x5, x5       # x3=F0000000
    add   x4, x3, x3       # x4=E0000000
    add   x6, x4, x4       # x6=C0000000
    add   x7, x6, x6       # x7=80000000
    ori   x8, zero, 1      # x8=00000001
    ori   x28, zero, 31
    srl   x29, x7, x28     # x29=00000001
    auipc x30, 0
    bne   x8, x29, dummy
    auipc x30, 0
    blt   x8, x7, dummy
    sra   x29, x7, x28     # x29=FFFFFFFF
    and   x29, x29, x3     # x29=x3=F0000000
    auipc x30, 0
    bne   x3, x29, dummy
    mv    x29, x8          # x29=x8=00000001
    bltu  x29, x7, pass_1  # unsigned 00000001 < 80000000
    auipc x30, 0
    j     dummy

pass_1:
    nop
    li    x31, 2
    sub   x3, x6, x7       # x3=40000000
    sub   x4, x7, x3       # x4=40000000
    slti  x9, x0, 1        # x9=00000001
    slt   x10, x3, x4
    slt   x10, x4, x3      # x10=00000000
    auipc x30, 0
    beq   x9, x10, dummy   # branch when x3 != x4
    srli  x29, x3, 30      # x29=00000001
    beq   x29, x9, pass_2
    auipc x30, 0
    j     dummy

pass_2:
    nop
# Test set-less-than
    li    x31, 3
    slti  x10, x1, 3       # x10=00000001
    slt   x11, x5, x1      # signed(0xF8000000) < -1
                        # x11=00000001
    slt   x12, x1, x3      # x12=00000001
    andi  x10, x10, 0xff
    and   x10, x10, x11
    and   x10, x10, x12    # x10=00000001
    auipc x30, 0
    beqz  x10, dummy
    sltu  x10, x1, x8      # unsigned FFFFFFFF < 00000001 ?
    auipc x30, 0
    bnez  x10, dummy
    sltu  x10, x8, x3      # unsigned 00000001 < F0000000 ?
    auipc x30, 0
    beqz  x10, dummy
    sltiu x10, x1, 3
    auipc x30, 0
    bnez  x10, dummy
    li    x11, 1
    bne   x10, x11, pass_3
    auipc x30, 0
    j     dummy

pass_3:
    nop
    li    x31, 4
    or    x11, x7, x3      # x11=C0000000
    beq   x11, x6, pass_4
    auipc x30, 0
    j     dummy

pass_4:
    nop
    li    x31, 5
    li    x18, 0x20        # base addr=00000020
### uncomment instr. below when simulating on venus
    # lui   x18, 0x10000     # base addr=10000000
    sw    x5, 0(x18)       # mem[0x20]=F8000000
    sw    x4, 4(x18)       # mem[0x24]=40000000
    lw    x27, 0(x18)      # x27=mem[0x20]=F8000000
    xor   x27, x27, x5     # x27=00000000
    sw    x6, 0(x18)       # mem[0x20]=C0000000
    lw    x28, 0(x18)      # x28=mem[0x20]=C0000000
    xor   x27, x6, x28     # x27=00000000
    auipc x30, 0
    bnez  x27, dummy
    lui   x20, 0xA0000     # x20=A0000000
    sw    x20, 8(x18)      # mem[0x28]=A0000000
    lui   x27, 0xFEDCB     # x27=FEDCB000
    srai  x27, x27, 12     # x27=FFFFEDCB
    li    x28, 8
    sll   x27, x27, x28    # x27=FFEDCB00
    ori   x27, x27, 0xff   # x27=FFEDCBFF
    lb    x29, 11(x18)     # x29=FFFFFFA0, little-endian, signed-ext
    and   x27, x27, x29    # x27=FFEDCBA0
    sw    x27, 8(x18)      # mem[0x28]=FFEDCBA0
    lhu   x27, 8(x18)      # x27=0000CBA0
    lui   x20, 0xFFFF0     # x20=FFFF0000
    and   x20, x20, x27    # x20=00000000
    auipc x30, 0
    bnez  x20, dummy       # check unsigned-ext
    li    x31, 6
    lbu   x28, 10(x18)     # x28=000000ED
    lbu   x29, 11(x18)     # x29=000000FF
    slli  x29, x29, 8      # x29=0000FF00
    or    x29, x29, x28    # x29=0000FFED
    slli  x29, x29, 16
    or    x29, x27, x29    # x29=FFEDCBA0
    lw    x28, 8(x18)      # x28=FFEDCBA0
    auipc x30, 0
    bne   x28, x29, dummy
    sw    x0, 0(x18)       # mem[0x20]=00000000
    sh    x27, 0(x18)      # mem[0x20]=0000CBA0
    li    x28, 0xD0
    sb    x28, 2(x18)      # mem[0x20]=00D0CBA0
    lw    x28, 0(x18)      # x28=00D0CBA0
    li    x29, 0x00D0CBA0
    auipc x30, 0
    bne   x28, x29, dummy
    lh    x27, 2(x18)      # x27=000000D0
    li    x28, 0xD0
    auipc x30, 0
    bne   x27, x28, dummy

pass_5:
    li    x31, 7
    auipc x30, 0
    bge   x1, x0, dummy    # -1 >= 0 ?
    bge   x8, x1, pass_6   # 1 >= -1 ?
    auipc x30, 0
    j     dummy

pass_6:
    auipc x30, 0
    bgeu  x0, x1, dummy    # 0 >= FFFFFFFF ?
    auipc x30, 0
    bgeu  x8, x1, dummy
    auipc x20, 0
    jalr  x21, x0, pass_7  # just for test : (
    auipc x30, 0
    j     dummy

pass_7:
# original test ends here
    addi  x20, x20, 8
    auipc x30, 0
    bne   x20, x21, dummy
pass_8:
    li    x31, 8
    addi  x1, x0, 1
    lui   x7, 1
    addi  x2, x1, 2
    lui   x7, 1
    beq   x1, x0, dummy
    addi  x3, x2, 3
    lui   x7, 1
    sw    x3, 0(x0)
    lui   x7, 1
    beq   x3, x0, dummy
    lw    x4, 0(x0)
    lui   x7, 1
    addi  x5, x4, 4
    lui   x7, 1
    beq   x5, x0, dummy
    addi  x6, x5, 5
    lui   x7, 1
    addi  x7, x7, 15
    addi  x8, x0, 1
    slli  x8, x8, 12
    sub   x7, x7, x8
    bne   x7, x6, dummy
pass_9:
    li    x31, 9
    addi  x0, x1, 1
    sub   x1, x1, x1
    bne   x0, x1, dummy
pass_10:
    li    x31, 10
    lui   x1, 233
    sw    x1, 0(x0)
    lw    x2, 0(x0)
    bne   x1, x2, dummy
pass_11:
    li    x31, 11
    addi  x1, x0, 233
    beq   x0, x1, dummy
    addi  x1, x0, -1
    blt   x0, x1, dummy
    bltu  x1, x0, dummy

passed:
    li    x31, 0x666
    j     dummy

该汇编代码为在 Lab 4-3 的基础上改编而来, 在原有的测试点 pass_7 后新增了 pass_8pass_11 \(4\) 个测试点

pass_9 主要测试普通 Use-Use Hazard

pass_10 主要测试 Load-Use, Use-Store, Lui-Use Hazard

pass_11 主要测试 Control Hazard

pass_8 是综合上面的所有情况, 将所有类型指令混合在一起而成的一大段代码, 包含了上面的所有冲突

具体为, 在 Use 与 Use 中间穿插进多条 lw, sw, lui 指令, 同时还穿插了跳转指令

  1. 仿真代码

module sim();
    reg clk;
    reg rst;
    testbench m0(.clk(clk), .rst(rst));
    initial begin
        clk = 1'b0;
        rst = 1'b1;
        #50;
        rst = 1'b0;
    end
    always #10 clk = ~clk;
endmodule
与Lab 5-1 相同

实验结果与分析

仿真结果

alt text

  1. 运行仿真后 x31 寄存器首先正常改为 8, 说明基本的程序运行没有问题, 通过了 Lba 4-3 的基本仿真代码

alt text

  1. 之后 x31 寄存器正常改为 9, 说明 8 号测试点通过, 符合预期

    下面来具体分析 9, 10, 11 号测试点

  2. 对于 9 号测试点, 首先 x1 = 1, 之后 x1 = x1 - x1, 即用自己减去自己, 最后判断结果是否为 0

    可以看到, 图中 x1 被改为 0, 因此通过了测试点 9

alt text

  1. 对于 10 号测试点, 首先 lui x1, 233, 之后先将 x1 的值存入 mem[0], 再将 mem[0] 的值读入 x2, 最后判断是否 x1 = x2

    这个过程中出现了 Lui-Use, Use-Store 冲突

    可以看到, 图中 x1 先被改为 000e9000, 这对应了 233\(16\) 进制表示; 然后看到 x2 = 000e9000, 因此通过了测试点 10

alt text

  1. 对于 11 号测试点, 首先 x1 = 233, 后跟一句含有 x1 的跳转指令; 之后先将 x1 的值设为 -1, 后跟两句含有 x1 的跳转指令

    这个过程中出现了控制冲突

    可以看到, 图中 x1 先被改为 000000e9, 这对应了 233\(16\) 进制表示; 然后看到 x1 = ffffffff, 最后通过了测试点 11

  2. 最终 x31 = 666, 说明通过了仿真测试

下板结果

思考题

基于你完成的流水线,对于以下两段代码分别分析:不同指令之间是否存在冲突(如果有,请逐条列出)、在你的流水线上运行的 CPI 为何。

TP-0:

addi    x1, x0, 0
addi    x2, x0, -1
addi    x3, x0, 1
addi    x4, x0, -1
addi    x5, x0, 1
addi    x6, x0, -1
addi    x1, x1, 0
addi    x2, x2, 1
addi    x3, x3, -1
addi    x4, x4, 1
addi    x5, x5, -1
addi    x6, x6, 1

TP-1:

verilog addi x1, x0, 1 addi x2, x1, 2 addi x3, x1, 3 addi x4, x3, 4

  1. 对于 TP-0, 指令之间没有冲突, 因为没有上下两条指令的互相依赖

    对于 addi xi, xi, c 指令, 在 ID 阶段取出 xi 中的值, EX 阶段与立即数相加, 并在 WB 阶段写回, 期间没有发生冲突

    所以 \(Cycles = 12+5-1=16,Instruction=12\)

    \(CPI = 16/12=1.33\)

  2. 对于 TP-1, 第一条与第二条指令间有冲突, 使用 Forwarding 解决;

第一条与第三条指令间有冲突, 使用 Forwarding 解决;

第三条与第四条指令间有冲突, 使用 Forwarding 解决;

所以 \(Cycles = 4+5-1=8,Instruction=4\)

$CPI = 8/4=2$

请根据你的实现,在 testbench 上仿真以下代码,给出仿真结果,并写出完成所有指令用了多少拍,必须给出的信号有 clk, IF-PC, ID-PC 以及所有用到的寄存器值。请务必注意调整数制为十六进制,缩放能够看到所有信号值

addi    x1, x0, 1
addi    x2, x1, 2
addi    x3, x2, 3
sw      x3, 0(x0)
lw      x4, 0(x0)
addi    x5, x4, 4
addi    x6, x4, 5

仿真结果如下:

alt text

一共用了 \(12\) 拍, 其中第 \(7\) 拍 stall 了 \(1\) 拍, 因为遇到了 Load-Use 冲突

之后由于 Forwarding, 没有更多的 stall, 所以一共用了 \(5+7-1+1=12\)

Cache

操作方法与实验步骤

代码设计层次结构图及说明

alt text

组相联映射方式

  1. 主存和cache按同样大小划分成块。
  2. 主存和cache按同样大小划分成组。
  3. 主存容量是cache容量的整数倍,将主存空间按cache区的大小分成区,主存中每一区的组数与cache的组数相同。
  4. 当主存的数据调入cache时,主存与cache的组号应相等,也就是各区中的某一块只能存入cache的同组号的空间内,但组内各块地址之间则可以任意存放, 即从主存的组到cache的组之间采用直接映射方式;在两个对应的组内部采用全相联映射方式。

LRU 替换算法

依据各块使用的情况,总是选择那个最近最少使用的块被替换。这种方法比较好地反映了程序局部性规律,命中率最高。

Write Back

方法:在CPU执行写操作时,只写入cache,不写入主存, 在 miss 时将 dirty 的数据块写回

优点:速度较高

缺点:可靠性较差,控制操作比较复杂

Allocate

miss 时将 memory 中数据搬到 cache 中

源代码

1. Data_ram 代码:

module Data_ram(
    input wire clk, // clock
    input wire en, // enable
    input wire rst, // reset
    input wire [6:0] addr, // address
    input wire [127:0] din, // data write in
    output wire [127:0] dout // data read out
    // input wire [Index_width-1:0] addr, // address
    // input wire [Block_width-1:0] din, // data write in
    // output wire [Block_width-1:0] dout // data read out
);
parameter NUM_of_sets = 128;
parameter Block_width = 128;
parameter Index_width = 7;
//cache line memory: data for way0
reg [Block_width-1:0] cache_data [0:NUM_of_sets-1];

//Read and Write data to Cache
integer i;
always @(posedge clk or posedge rst) begin
    if(rst) begin
        for(i=0; i<NUM_of_sets; i=i+1) begin
            cache_data[i] <= 128'b0;
        end
    end
    else begin
        if(en) begin
            cache_data[addr] <= din;
        end
        else begin
            cache_data[addr] <= cache_data[addr];
        end
    end
end
assign dout = cache_data[addr];
endmodule

Data_ram 模块中, 实现了 \(128\) 个长度为 \(128\) 的寄存器,用于保存 cache 中的数据, 可以存储 \(4\) 个 word

rst = 1 时,寄存器清零

en = 1时允许写入数据,否则不能写入

2. Tag_ram 代码:

module Tag_ram(
    input wire clk, // clock
    input wire en, // enable
    input wire rst, // reset
    input wire [6:0] addr, // address
    input wire [25:0] din, // data write in
    output wire [25:0] dout // data read out
);
parameter NUM_of_sets = 128;
parameter Index_width = 7;
parameter TAG_width = 23;
parameter V = 1;
parameter U = 1;
parameter D = 1;

//cache line memory: tag,V,U,D for way0
reg [TAG_width+V+U+D-1:0] cache_TAG [0:NUM_of_sets-1];
integer i;
always @(posedge clk or posedge rst) begin
    if(rst) begin
        if(rst) begin
            for(i=0; i<NUM_of_sets; i=i+1) begin
                cache_TAG[i] <= 128'b0;
            end
        end
    end
    else begin
        if(en) begin
            cache_TAG[addr] <= din;    
        end
        else begin
            cache_TAG[addr] <= cache_TAG[addr];    
        end
    end
    //Read and Write TAG to Cache
end

assign dout = cache_TAG[addr];
endmodule

Tag_ram 模块中, 实现了 \(128\) 个长度为 \(26\) 的寄存器,用于保存 cache 中的标签与有效标记, 最高位存储 V, 表示是否有效; 第二位 U 表示是否最近使用; D 表示块中的数据是否被改变过, 即是否是脏位; 最后 \(23\) 位存储标签

rst = 1 时,寄存器清零

en = 1 时允许写入数据,否则不能写入

3. Cache 代码:

module cache(
    input wire clk, // clock
    input wire rst, // reset
    input wire [31:0] data_cpu_write, // data write in
    input wire [31:0] data_mem_read, // data read in
    input wire [31:0] addr_cpu, // cpu addr
    input wire wr_cpu, // cpu write enable
    input wire rd_cpu, // cpu read enable
    input wire ready_mem, // memory ready
    output reg wr_mem, // memory write enable // write back
    output reg rd_mem, // memory read enable 
    output reg [31:0] data_mem_write, // data to mem // write back
    output reg [31:0] data_cpu_read, // data to cpu
    output reg [31:0] addr_mem // memory addr
);
parameter NUM_of_sets = 128;
parameter Block_width = 128;
parameter Index_width = 7;
parameter TAG_width = 23;
parameter tag_width = 26;
parameter V = 1;
parameter U = 1;
parameter D = 1;
parameter IDLE = 0;
parameter CompareTag = 1;
parameter Allocate = 2;
parameter WriteBack = 3;

// Cache Controller State Machine and Logic
reg [1:0] state;
reg [1:0] next_state;
reg [TAG_width+V+U+D-1:0] wtag0, wtag1;
reg ent0, ent1, en0, en1;
wire [TAG_width+V+U+D-1:0] rtag0, rtag1;
reg [Block_width-1:0] wdata;
reg [Block_width-1:0] wdata_hit, rdata_hit;
reg [Block_width-1:0] wdata_miss;
wire [Block_width-1:0] rdata0, rdata1, rdata;
wire [TAG_width-1:0] tag, tag0, tag1;
wire [Index_width-1:0] index;
wire [1:0] offset;

wire hit, hit0, hit1;
wire valid0, valid1;
wire cpu_req_valid;
assign cpu_req_valid = (wr_cpu || rd_cpu);
assign tag = addr_cpu[31:9];
assign index = addr_cpu[8:2];
assign offset = addr_cpu[1:0];
assign valid0 = rtag0[TAG_width+V+U+D-1];
assign valid1 = rtag1[TAG_width+V+U+D-1];
assign tag0 = rtag0[TAG_width-1:0];
assign tag1 = rtag1[TAG_width-1:0];
assign hit0 = ((tag0 == tag) && valid0);
assign hit1 = ((tag1 == tag) && valid1);
assign hit = (hit0 || hit1);
assign rdata = hit0 ? rdata0 : (hit1 ? rdata1 : 128'b0);
reg [2:0] mem_ready, next_mem_ready;
reg [Block_width-1:0] mem_data;
wire dirty;
assign dirty = (rtag0[24:23] == 2'b01) || (rtag1[24:23] == 2'b01);

always@(*) begin
    case(offset)
        2'd0: rdata_hit = rdata[31:0];
        2'd1: rdata_hit = rdata[63:32];
        2'd2: rdata_hit = rdata[95:64];
        2'd3: rdata_hit = rdata[127:96];
    endcase
    case(offset)
        2'd0: wdata_hit = {rdata[127:32], data_cpu_write};
        2'd1: wdata_hit = {rdata[127:64], data_cpu_write, rdata[31:0]};
        2'd2: wdata_hit = {rdata[127:96], data_cpu_write, rdata[63:0]};
        2'd3: wdata_hit = {data_cpu_write, rdata[95:0]};
    endcase
end

Data_ram d0 (
    .clk(clk),
    .rst(rst),
    .addr(index),
    .din(wdata),
    .en(en0),
    .dout(rdata0)
);
Data_ram d1 (
    .clk(clk),
    .rst(rst),
    .addr(index),
    .din(wdata),
    .en(en1),
    .dout(rdata1)
);
Tag_ram t0 (
    .clk(clk),
    .rst(rst),
    .addr(index),
    .din(wtag0),
    .en(ent0),
    .dout(rtag0)
);
Tag_ram t1 (
    .clk(clk),
    .rst(rst),
    .addr(index),
    .din(wtag1),
    .en(ent1),
    .dout(rtag1)
);

always@(posedge clk or posedge rst) begin
    if(rst) begin
        state <= IDLE;
        mem_ready <= 3'd0;
    end
    else begin
        state <= next_state;
        mem_ready <= next_mem_ready;
    end
end


always@(*) begin
    case(state)
        IDLE: begin
            next_mem_ready = 3'd0;
            en0 = 1'b0;
            en1 = 1'b0;
            ent0 = 1'b0;
            ent1 = 1'b0;
            wtag0 = 0;
            wtag1 = 0;
            wr_mem = 0;
            rd_mem = 0;
            data_mem_write = 0;
            //data_cpu_read = 0;
            addr_mem = 0;
            if(cpu_req_valid) begin
                next_state = CompareTag; 
            end
            else begin
                next_state = IDLE;
            end
        end
        CompareTag: begin
            mem_ready = 3'd0;
            if(hit) begin
                next_state = IDLE;
                ent0 = 1'b1;
                ent1 = 1'b1;
                if(wr_cpu) begin // write hit 
                    wdata = wdata_hit;
                    if(hit0) begin
                        en0 = 1'b1;
                        en1 = 1'b0;
                        wtag0 = {3'b111, rtag0[22:0]};    // recently used, set dirty
                        wtag1 = {rtag1[25], 1'b0, rtag1[23:0]};    // not recently used
                    end
                    else begin
                        en0 = 1'b0;
                        en1 = 1'b1;
                        wtag0 = {rtag0[25], 1'b0, rtag0[23:0]};    // not recently used
                        wtag1 = {3'b111, rtag1[22:0]};    // recently used, set dirty
                    end
                end
                else begin // read hit
                    data_cpu_read = rdata_hit;
                    en0 = 1'd0;
                    en1 = 1'd0;
                    wtag0 = {rtag0[25], hit0, rtag0[23:0]};   
                    wtag1 = {rtag1[25], hit1, rtag1[23:0]};    
                end
            end
            else begin
                if(dirty) begin
                    next_state = WriteBack;
                end
                else begin
                    next_state = Allocate;
                end
            end
        end
        Allocate: begin // read/write miss
            if(mem_ready >= 3'd4) begin
                rd_mem = 0;
                next_mem_ready = 3'd0;
                next_state = CompareTag;
                if(rtag0[24] == 1'b0) begin   // not recently used
                    en0 = 1'b1;
                    ent0 = 1'b1;
                    en1 = 1'b0;
                    ent1 = 1'b0;
                    wtag0 = {3'b110, tag};    // recently used, set clean
                    wdata = mem_data;
                end
                else begin
                    en0 = 1'b0;
                    ent0 = 1'b0;
                    en1 = 1'b1;
                    ent1 = 1'b1;
                    wtag1 = {3'b110, tag};    // recently used, set clean
                    wdata = mem_data;
                end
            end
            else begin // get data from memory
                en0 = 1'd0;
                en1 = 1'd0;
                ent0 = 1'd0;
                ent1 = 1'd0;
                rd_mem = 1;
                next_state = Allocate;
                if(ready_mem)
                    next_mem_ready = mem_ready + 1;
                else 
                    next_mem_ready = mem_ready;
                case(mem_ready)
                    3'd0: mem_data[31:0] = data_mem_read;
                    3'd1: mem_data[63:32] = data_mem_read;
                    3'd2: mem_data[95:64] = data_mem_read;
                    3'd3: mem_data[127:96] = data_mem_read;
                endcase
            end
        end

        WriteBack: begin // carry dirty data to memory
            if(mem_ready >= 3'd4) begin
                wr_mem = 0;
                next_mem_ready = 3'd0;
                en0 = 1'd0;
                en1 = 1'd0;
                ent0 = 1'd0;
                ent1 = 1'd0;
                next_state = Allocate;
            end
            else begin
                wr_mem = 1;
                next_state = WriteBack;
                if(ready_mem)
                    next_mem_ready = mem_ready + 1;
                else 
                    next_mem_ready = mem_ready;

                if(rtag0[24] == 1'b0) begin   // not recently used
                    en0 = 1'b1;
                    ent0 = 1'b1;
                    en1 = 1'b0;
                    ent1 = 1'b0;
                    case(mem_ready)
                        3'd0: begin addr_mem = {rtag0, index, 2'b00}; data_mem_write = rdata0[31:0]; end
                        3'd1: begin addr_mem = {rtag0, index, 2'b01}; data_mem_write = rdata0[63:32]; end
                        3'd2: begin addr_mem = {rtag0, index, 2'b10}; data_mem_write = rdata0[95:64]; end
                        3'd3: begin addr_mem = {rtag0, index, 2'b11}; data_mem_write = rdata0[127:96]; end
                    endcase
                end
                else begin
                    en0 = 1'b0;
                    ent0 = 1'b0;
                    en1 = 1'b1;
                    ent1 = 1'b1;
                    case(mem_ready)
                        3'd0: begin addr_mem = {rtag1, index, 2'b00}; data_mem_write = rdata1[31:0]; end
                        3'd1: begin addr_mem = {rtag1, index, 2'b01}; data_mem_write = rdata1[63:32]; end
                        3'd2: begin addr_mem = {rtag1, index, 2'b10}; data_mem_write = rdata1[95:64]; end
                        3'd3: begin addr_mem = {rtag1, index, 2'b11}; data_mem_write = rdata1[127:96]; end
                    endcase
                end

            end
        end
        default: next_state = IDLE;
    endcase
end

endmodule

cache 模块中实现了 \(4\)ram 模块,存储两组数据与标签; 同时实现了 cache controller 的有限状态机:

alt text

当产生了来自 cpu 的读写信号时, 从 IDLE 进入 CompareTag 状态,比较下标为 index 的组中是否有相同的 tag, 如果有则 hit, 那么就返回 IDLE 状态,更改V, U 标记位; 并且如果是写,就更改 Dirty 标记

如果是 miss,就需要将 memory 中的数据拿到 cache 中

如果当前这个 block 已经满了, 就需要替换掉最近没使用的那一块

如果这一块数据被更改过,需要先将原本的更改过的数据存回 memory,即 write back

再将数据从 memory 搬到 cache 中,即 allocate

搬完数据后重新比较标签, 此时变为 hit, 按照 hit 的方式处理即可

仿真关键步骤说明

1. testbench 代码:

module cache_tb();

reg clk; // clock
reg rst; // reset
reg [31:0] data_cpu_write; // data write in
reg [31:0] data_mem_read; // data read in
reg [31:0] addr_cpu; // cpu addr
reg wr_cpu; // cpu write enable
reg rd_cpu; // cpu read enable
reg ready_mem; // memory ready
wire wr_mem; // memory write enable // write back
wire rd_mem; // memory read enable 
wire [31:0] data_mem_write; // data to mem // write back
wire [31:0] data_cpu_read; // data to cpu
wire [31:0] addr_mem; // memory addr

cache c(.clk(clk), .rst(rst), .data_cpu_write(data_cpu_write), 
    .data_mem_read(data_mem_read), .addr_cpu(addr_cpu),
    .wr_cpu(wr_cpu), .rd_cpu(rd_cpu), .ready_mem(ready_mem),
    .wr_mem(wr_mem), .rd_mem(rd_mem), .data_mem_write(data_mem_write),
    .data_cpu_read(data_cpu_read), .addr_mem(addr_mem)
);

initial begin
    wr_cpu = 0;
    rd_cpu = 0;
    clk = 1;
    rst = 1;
    ready_mem = 1;
    #60;
    rst = 0;
    #40;
    //write miss
    wr_cpu = 1'd1;
    addr_cpu = 32'h00000207;
    data_cpu_write = 32'h19198100;
    #20;
    #20;
    data_mem_read = 32'habababab;
    #20;
    data_mem_read = 32'hcdcdcdcd;
    #20;
    data_mem_read = 32'h12345678;
    #20;
    data_mem_read = 32'h11451400;
    #200;
    wr_cpu = 1'd0;
    //read hit
    rd_cpu = 1'd1;
    addr_cpu = 32'h00000207;
    #100;
    rd_cpu = 1'd0;



    //write hit
    wr_cpu = 1'd1;
    addr_cpu = 32'h00000207;
    data_cpu_write = 32'hdeadbeef;
    #120;
    wr_cpu = 0;
    //read miss
    rd_cpu = 1'd1;
    addr_cpu = 32'h0000020A;
    #20;
    #20;
    data_mem_read = 32'haaaaaaaa;
    #20;
    data_mem_read = 32'hbbbbbbbb;
    #20;
    data_mem_read = 32'hcccccccc;
    #20;
    data_mem_read = 32'hdddddddd;
    #40;
    rd_cpu = 1'd0;
    #100;

    //read hit
    rd_cpu = 1'd1;
    addr_cpu = 32'h00000208;
    #40;
    rd_cpu = 1'd0;
    #100;
end
always #10 clk = ~clk;
endmodule

对于 read/write, hit/miss 分别进行仿真

实验结果与分析

仿真结果

alt text

alt text

  1. 初始 cache 为空

    之后向 207 写入值 19198100, 由于原本 207 为空,所以需要与 memory 交互 \(4\) 个周期, 获得 memory 中对应的 \(4\) 个 word

    再将 \(4\) 个 word 写入 cache 中,完成 write back 阶段

  2. 之后再次对比标签,发现 hit, 所以将对应数据块中数据更改,并标记 dirty = 1

    最后数据块中的 11451400 被改为 19198100 3. 之后读取 207 地址的第 \(4\) 个 word, 读取到了数据 19198100, 说明读 hit 4. 之后将 207 数据改成 deadbeef, 出现了写 hit 5. 之后换到地址 20A 并读取,出现读 miss, 所以与内存交互 \(4\) 周期, 将数据写入 cache
    6. 之后重新进入 CompareTag 阶段,看到读 hit, 读出了数据 cccccccc 7. 再读 208 地址, 也出现读 hit, 读出数据 aaaaaaaa 8. 再写入 10000207 地址, tag 改变, 重新 allocate, 读取 \(4\) 个 word 数据 9. 之后写 hit, 并且把第 \(0\) 组的最近使用位 U = 0, 把第 \(1\) 组的最近使用位 U = 1, D = 1, 并写入数据 20241225 10. 再写入 20000207 地址, 出现 miss, 由于 \(0, 1\) 组已满,所以选择最近未使用的 \(0\) 组进行 write back, 向 memory 写入 \(4\) 个 word 数据 11. 之后进行 allocate, 读取 \(4\) 个 word 数据 12. 最后出现写 hit, 写入 07210721 数据