自己设计的和标准的可能差的挺多, 仿真也不是特别完善, 导致可能有些 Bug 是没有发现但真实存在的
SCPU
Lab 4-1
操作方法与实验步骤
代码设计层次结构图及说明
SCPU 模块由 DataPath 和 SCPU_Ctrl 组成,其中 SCPU_Ctrl 模块以 inst[6:2], inst[14:12], inst[30] 作为输入,分别表示 OPcode, Fun3, Fun7
DataPath 模块以 SCPU_Ctrl 的各个控制信号为输入,输出 ALU 得到的结果,输出到 RAM 的数据,以及程序指针 PC 的值
整体 SCPU 模块输入时钟信号 clk , 指令 inst 和 重置信号 rst 等,输出单周期 CPU 运行后的结果
源代码
SCPU代码:
module SCPU(
input clk,
input rst,
input MIO_ready,
input [31:0]inst_in,
input [31:0]Data_in,
output CPU_MIO,
output MemRW,
output [31:0]PC_out,
output [31:0]Data_out,
output [31:0]Addr_out,
output [2:0] ALU_Control
);
wire [1:0] ImmSel;
wire ALUSrc_B;
wire [1:0] MemtoReg;
wire Jump;
wire Branch;
wire RegWrite;
SCPU_ctrl Ctrl(.OPcode(inst_in[6:2]), .Fun3(inst_in[14:12]), .Fun7(inst_in[30]),
.MIO_ready(MIO_ready), .ImmSel(ImmSel), .ALUSrc_B(ALUSrc_B), .MemtoReg(MemtoReg),
.Jump(Jump), .Branch(Branch), .RegWrite(RegWrite), .MemRW(MemRW),
.ALU_Control(ALU_Control), .CPU_MIO(CPU_MIO));
DataPath DP(.ImmSel(ImmSel), .ALUSrc_B(ALUSrc_B), .MemtoReg(MemtoReg),
.Jump(Jump), .Branch(Branch), .RegWrite(RegWrite), .ALU_Control(ALU_Control),
.Data_in(Data_in), .clk(clk), .inst_field(inst_in), .rst(rst),
.ALU_out(Addr_out), .Data_out(Data_out), .PC_out(PC_out));
endmodule
在主体代码中进行了连线,将 SCPU_Ctrl 模块和 DataPath 模块相连接,并接好对应的输入输出
Lab 4-2
操作方法与实验步骤
代码设计层次结构图及说明
DataPath 的结构图如下:
DataPath 模块由以下子模块组成:RegFile, ALU, ImmGen
其中,RegFile 和 ALU 在 Lab 1 中完成,ImmGen 在本次实验中完成
ImmGen 模块的作用是,对于 SCPU 产生的 ImmSel 信号,根据 inst 产生对应的立即数,并送给后面的 ALU、RegFile 等模块
SCPU_Ctrl 模块的作用是,根据 inst 识别出指令格式,并根据不同指令产生对应的控制信号,送给 DataPath
源代码
1. ImmGen 代码:
module ImmGen(
input [1:0] ImmSel,
input [31:0] inst_field,
output reg [31:0] Imm_out
);
always @(*) begin
case (ImmSel)
2'b00: // I-type
Imm_out = {{20{inst_field[31]}}, inst_field[31:20]};
2'b01: // S-type
Imm_out = {{20{inst_field[31]}}, inst_field[31:25], inst_field[11:7]}; //
2'b10: // B-type
Imm_out = {{19{inst_field[31]}}, inst_field[31], inst_field[7],
inst_field[30:25], inst_field[11:8], 1'b0};
2'b11: // J-type
Imm_out = {{11{inst_field[31]}}, inst_field[31], inst_field[19:12],
inst_field[20], inst_field[30:21], 1'b0}; //
endcase
end
endmodule
对于 I 型指令,将 inst[31:20] 做符号拓展即可得到最终立即数;
对于 S 型指令,将 inst[31:25],inst[11:7] 合并后做符号拓展即可得到最终立即数;
对于 B, J 型指令,将打乱的立即数部分重新拼接,合并后做符号拓展即可得到最终立即数;
2. SCPU_Ctrl 代码:
module SCPU_ctrls(
input [4:0] OPcode,
input [2:0] Fun3,
input Fun7,
input MIO_ready,
output reg [1:0] ImmSel,
output reg ALUSrc_B,
output reg [1:0] MemtoReg,
output reg Jump,
output reg Branch,
output reg RegWrite,
output reg MemRW,
output reg [3:0] ALU_Control,
output reg CPU_MIO
);
initial begin
ImmSel = 2'b00;
ALUSrc_B = 1'b0;
MemtoReg = 2'b00;// 0: ALU result 1: Load from RAM to reg 2/3: PC4, JAL
Jump = 1'b0;
Branch = 1'b0;
RegWrite = 1'b0;
MemRW = 1'b0;// write to / read from RAM
ALU_Control = 4'b0000;
end
always @(*) begin
case (OPcode)
5'b01100: begin // R-type
ALUSrc_B = 1'b0;
MemtoReg = 2'b00;
RegWrite = 1'b1;
Jump = 1'b0;
Branch = 1'b0;
case ({Fun3, Fun7})
4'b0000: ALU_Control = 4'b0000; // ADD
4'b0001: ALU_Control = 4'b0001; // SUB
4'b0010: ALU_Control = 4'b0010; // SLL
4'b0100: ALU_Control = 4'b0011; // SLT
4'b0110: ALU_Control = 4'b0100; // SLTU
4'b1000: ALU_Control = 4'b0101; // XOR
4'b1010: ALU_Control = 4'b0110; // SRL
4'b1011: ALU_Control = 4'b0111; // SRA
4'b1100: ALU_Control = 4'b1000; // OR
4'b1110: ALU_Control = 4'b1001; // AND
endcase
end
5'b00000: begin // Load
ALUSrc_B = 1'b1;
MemtoReg = 2'b01;
RegWrite = 1'b1;
ImmSel = 2'b00;
MemRW = 1'b0; // read
Jump = 1'b0;
Branch = 1'b0;
ALU_Control = 4'b0000; // ADD for address calculation
end
5'b01000: begin // Store
ALUSrc_B = 1'b1;
MemtoReg = 2'bx;
RegWrite = 1'b0;
ImmSel = 2'b01;
MemRW = 1'b1; // write
Jump = 1'b0;
Branch = 1'b0;
ALU_Control = 4'b0000; // ADD for address calculation
end
5'b11000: begin // Branch
ALUSrc_B = 1'b0;
MemtoReg = 2'bx; // new
RegWrite = 1'b0; //
ImmSel = 2'b10;
Branch = 1'b1;
Jump = 1'b0;
ALU_Control = 4'b0001; // SUB for branch condition check
end
5'b11011: begin // JAL
ALUSrc_B = 1'bx;
MemtoReg = 2'b10; // PC + 4
RegWrite = 1'b1; //
ImmSel = 2'b11;
Branch = 1'b0;
Jump = 1'b1;
end
5'b00100: begin // I-type ALU
ALUSrc_B = 1'b1;
MemtoReg = 2'b00;
RegWrite = 1'b1;
ImmSel = 2'b00;
Jump = 1'b0;
Branch = 1'b0;
case (Fun3)
3'b000: ALU_Control = 4'b0000; // ADDI
3'b001: ALU_Control = 4'b0010; // SLLI
3'b010: ALU_Control = 4'b0011; // SLTI
3'b011: ALU_Control = 4'b0100; // SLTIU
3'b100: ALU_Control = 4'b0101; // XORI
3'b101: ALU_Control = (Fun7) ? 4'b0111 : 4'b0110; // SRAI / SRLI
3'b110: ALU_Control = 4'b1000; // ORI
3'b111: ALU_Control = 4'b1001; // ANDI
endcase
end
endcase
end
endmodule
为了支持 R 型指令(ALU), 我们添加 ALUSrc_B,ALU_Control 信号,当 ALUSrc_B=0 时选择寄存器 rs2 作为 ALU 计算的输入 B ;
ALU_Control 选择不同值时,ALU 会做不同的基本运算,并将结果输出
ALU_Control 与 Lab 1 中的 ALU 的操作值一一对应
同时添加 RegWrite 信号,控制 RegFile 写使能,来将 ALU 结果存到寄存器中
同时添加 MemtoReg 信号,控制写入寄存器的值, MemtoReg=0 表示将 ALU 结果写入寄存器
为了支持 I 型指令( Ld 和立即数 ALU 操作),添加 ImmSel 信号,输入到 ImmGen 中,来选择不同类型的立即数;同时 ALUSrc_B=1 表示将立即数作为 ALU 第二个操作数 B
同时 Ld 指令中 MemtoReg=1 表示将内存中读取的值写入寄存器
为了支持 Sd 指令,添加 MemRW 信号, MemRW=1 控制内存写使能
为了支持 B,J 型指令,添加 Branch, Jump 信号,同时 MemtoReg=2 表示将 PC+4 写入寄存器
3. DataPath 代码:
module DataPaths(
input [1:0] ImmSel, input ALUSrc_B, input [1:0] MemtoReg, input Jump, input Branch, input RegWrite, input [3:0] ALU_Control,
input [31:0] Data_in, input clk, input [31:0] inst_field, input rst,
output [31:0] Reg00, output [31:0] Reg01, output [31:0] Reg02, output [31:0] Reg03,
output [31:0] Reg04, output [31:0] Reg05, output [31:0] Reg06, output [31:0] Reg07,
output [31:0] Reg08, output [31:0] Reg09, output [31:0] Reg10, output [31:0] Reg11,
output [31:0] Reg12, output [31:0] Reg13, output [31:0] Reg14, output [31:0] Reg15,
output [31:0] Reg16, output [31:0] Reg17, output [31:0] Reg18, output [31:0] Reg19,
output [31:0] Reg20, output [31:0] Reg21, output [31:0] Reg22, output [31:0] Reg23,
output [31:0] Reg24, output [31:0] Reg25, output [31:0] Reg26, output [31:0] Reg27,
output [31:0] Reg28, output [31:0] Reg29, output [31:0] Reg30, output [31:0] Reg31,
output [31:0] ALU_out, output [31:0] Data_out, output [31:0] PC_out);
wire and_2;
wire [31:0] and_2_out;
wire ALU_zero;
wire [31:0] add_0;
wire [31:0] ALU_A;
wire [31:0] ALU_B;
reg [31:0] reg_wt_data;
wire [31:0] PC4;
wire [31:0] PC_in;
wire [31:0] ImmOut;
assign PC4 = PC_out + 32'd4;
assign and_2 = Branch & ALU_zero;
assign add_0 = ImmOut + PC_out;
always @ (*) begin
case (MemtoReg)
2'd0: reg_wt_data = ALU_out; //
2'd1: reg_wt_data = Data_in;
2'd2: reg_wt_data = PC4;
2'd3: reg_wt_data = PC4;
endcase
end
assign ALU_B = ALUSrc_B ? ImmOut : Data_out;
assign and_2_out = and_2 ? add_0 : PC4;
assign PC_in = Jump ? add_0 : and_2_out;
ALU alu(.A(ALU_A), .B(ALU_B), .ALU_operation(ALU_Control), .res(ALU_out), .zero(ALU_zero));
wire [4:0] RS1, RS2, WT;
assign RS1 = inst_field[19:15];
assign RS2 = inst_field[24:20];
assign WT = inst_field[11:7];
Regs regs(.clk(clk), .rst(rst), .Rs1_addr(RS1), .Rs2_addr(RS2),
.Wt_addr(WT), .Wt_data(reg_wt_data), .RegWrite(RegWrite),
.Rs1_data(ALU_A), .Rs2_data(Data_out),
.Reg00(Reg00),.Reg01(Reg01),.Reg02(Reg02),.Reg03(Reg03),
.Reg04(Reg04),.Reg05(Reg05),.Reg06(Reg06),.Reg07(Reg07),
.Reg08(Reg08),.Reg09(Reg09),.Reg10(Reg10),.Reg11(Reg11),
.Reg12(Reg12),.Reg13(Reg13),.Reg14(Reg14),.Reg15(Reg15),
.Reg16(Reg16),.Reg17(Reg17),.Reg18(Reg18),.Reg19(Reg19),
.Reg20(Reg20),.Reg21(Reg21),.Reg22(Reg22),.Reg23(Reg23),
.Reg24(Reg24),.Reg25(Reg25),.Reg26(Reg26),.Reg27(Reg27),
.Reg28(Reg28),.Reg29(Reg29),.Reg30(Reg30),.Reg31(Reg31));
ImmGen immgen(.ImmSel(ImmSel), .inst_field(inst_field), .Imm_out(ImmOut));
Reg PC(.clk(clk), .rst(rst), .CE(1'b1), .D(PC_in), .Q(PC_out));
endmodule
在 DataPath 代码中对 ImmGen, RegFile, ALU 等模块进行了实例化,将 SCPU_Ctrl 传入的控制信号用于控制不同模块的操作
4. SCPU代码:
module SCPU(
input clk,
input rst,
input MIO_ready,
input [31:0] inst_in,
input [31:0] Data_in,
output CPU_MIO,
output MemRW,
output [31:0] PC_out,
output [31:0] Data_out,
output [31:0] Addr_out,
output [31:0] Reg00, output [31:0] Reg01,output [31:0] Reg02,output [31:0] Reg03,
output [31:0] Reg04, output [31:0] Reg05,output [31:0] Reg06,output [31:0] Reg07,
output [31:0] Reg08, output [31:0] Reg09,output [31:0] Reg10,output [31:0] Reg11,
output [31:0] Reg12, output [31:0] Reg13,output [31:0] Reg14,output [31:0] Reg15,
output [31:0] Reg16, output [31:0] Reg17,output [31:0] Reg18,output [31:0] Reg19,
output [31:0] Reg20, output [31:0] Reg21,output [31:0] Reg22,output [31:0] Reg23,
output [31:0] Reg24, output [31:0] Reg25,output [31:0] Reg26,output [31:0] Reg27,
output [31:0] Reg28, output [31:0] Reg29,output [31:0] Reg30,output [31:0] Reg31,
output [3:0] ALU_Control
);
wire [1:0] ImmSel;
wire ALUSrc_B;
wire [1:0] MemtoReg;
wire Jump;
wire Branch;
wire RegWrite;
wire [4:0] OP;
wire [2:0] FUN3;
wire FUN7;
assign OP = inst_in[6:2];
assign FUN3 = inst_in[14:12];
assign FUN7 = inst_in[30];
SCPU_ctrls Ctrl(.OPcode(OP), .Fun3(FUN3), .Fun7(FUN7),
.MIO_ready(MIO_ready), .ImmSel(ImmSel), .ALUSrc_B(ALUSrc_B), .MemtoReg(MemtoReg),
.Jump(Jump), .Branch(Branch), .RegWrite(RegWrite), .MemRW(MemRW), .ALU_Control(ALU_Control),
.CPU_MIO(CPU_MIO));
DataPaths DP(.ImmSel(ImmSel), .ALUSrc_B(ALUSrc_B), .MemtoReg(MemtoReg),
.Jump(Jump), .Branch(Branch), .RegWrite(RegWrite), .ALU_Control(ALU_Control),
.Data_in(Data_in), .clk(clk), .inst_field(inst_in), .rst(rst),
.ALU_out(Addr_out), .Data_out(Data_out), .PC_out(PC_out),
.Reg00(Reg00),.Reg01(Reg01),.Reg02(Reg02),.Reg03(Reg03),
.Reg04(Reg04),.Reg05(Reg05),.Reg06(Reg06),.Reg07(Reg07),
.Reg08(Reg08),.Reg09(Reg09),.Reg10(Reg10),.Reg11(Reg11),
.Reg12(Reg12),.Reg13(Reg13),.Reg14(Reg14),.Reg15(Reg15),
.Reg16(Reg16),.Reg17(Reg17),.Reg18(Reg18),.Reg19(Reg19),
.Reg20(Reg20),.Reg21(Reg21),.Reg22(Reg22),.Reg23(Reg23),
.Reg24(Reg24),.Reg25(Reg25),.Reg26(Reg26),.Reg27(Reg27),
.Reg28(Reg28),.Reg29(Reg29),.Reg30(Reg30),.Reg31(Reg31));
endmodule
5. 其他模块代码:
ALU
module ALU(A, B, ALU_operation, res, zero, overflow );
input wire [31:0] A;
input wire [31:0] B;
input wire [3:0] ALU_operation;
output reg [31:0] res;
output wire zero;
output reg overflow;
wire [31:0] res_and,res_or,res_add,res_sub,
res_xor,res_slt,res_sltu,res_sll,res_srl,res_sra;
wire [4:0] bit;
assign res_xor = A^B;
assign res_and = A&B;
assign res_or = A|B;
assign res_add = $signed($signed(A)+$signed(B));
assign res_sub = $signed($signed(A)-$signed(B));
assign res_slt = ($signed(A) < $signed(B)) ? 1 : 0;
assign res_sltu = ($unsigned(A) < $unsigned(B)) ? 1 : 0;
assign bit = $unsigned(B[4:0]);
assign res_sll = A<<B[4:0];
assign res_srl = A>>B[4:0];
assign res_sra = $signed(A)>>>$signed(B[4:0]);
always @ (A or B or ALU_operation)begin
case (ALU_operation)
4'd0: begin res=res_add;overflow=(res<A);end
4'd1: begin res=res_sub;overflow=(res>A);end
4'd2: begin res=res_sll;overflow=0;end
4'd3: begin res=res_slt;overflow=0;end
4'd4: begin res=res_sltu;overflow=0;end
4'd5: begin res=res_xor;overflow=0;end
4'd6: begin res=res_srl;overflow=0;end
4'd7: begin res=res_sra;overflow=0;end
4'd8: begin res=res_or;overflow=0;end
4'd9: begin res=res_and;overflow=0;end
default: begin res=32'h0;overflow=0;end
endcase
end
assign zero = (res==0)? 1: 0;
endmodule
RegFile
module Regs(
input wire clk,
input wire rst,
input wire [4:0] Rs1_addr,
input wire [4:0] Rs2_addr,
input wire [4:0] Wt_addr,
input wire [31:0] Wt_data,
input wire RegWrite,
output wire [31:0] Rs1_data,
output wire [31:0] Rs2_data,
output wire [31:0] Reg00,
output wire [31:0] Reg01,
output wire [31:0] Reg02,
output wire [31:0] Reg03,
output wire [31:0] Reg04,
output wire [31:0] Reg05,
output wire [31:0] Reg06,
output wire [31:0] Reg07,
output wire [31:0] Reg08,
output wire [31:0] Reg09,
output wire [31:0] Reg10,
output wire [31:0] Reg11,
output wire [31:0] Reg12,
output wire [31:0] Reg13,
output wire [31:0] Reg14,
output wire [31:0] Reg15,
output wire [31:0] Reg16,
output wire [31:0] Reg17,
output wire [31:0] Reg18,
output wire [31:0] Reg19,
output wire [31:0] Reg20,
output wire [31:0] Reg21,
output wire [31:0] Reg22,
output wire [31:0] Reg23,
output wire [31:0] Reg24,
output wire [31:0] Reg25,
output wire [31:0] Reg26,
output wire [31:0] Reg27,
output wire [31:0] Reg28,
output wire [31:0] Reg29,
output wire [31:0] Reg30,
output wire [31:0] Reg31
);
integer i;
reg [31:0] Reg [31:0];
initial begin
for(i = 0; i < 32; i = i + 1) begin
Reg[i] <= 32'b0;
end
end
assign Reg00 = Reg[0];
assign Reg01 = Reg[1];
assign Reg02 = Reg[2];
assign Reg03 = Reg[3];
assign Reg04 = Reg[4];
assign Reg05 = Reg[5];
assign Reg06 = Reg[6];
assign Reg07 = Reg[7];
assign Reg08 = Reg[8];
assign Reg09 = Reg[9];
assign Reg10 = Reg[10];
assign Reg11 = Reg[11];
assign Reg12 = Reg[12];
assign Reg13 = Reg[13];
assign Reg14 = Reg[14];
assign Reg15 = Reg[15];
assign Reg16 = Reg[16];
assign Reg17 = Reg[17];
assign Reg18 = Reg[18];
assign Reg19 = Reg[19];
assign Reg20 = Reg[20];
assign Reg21 = Reg[21];
assign Reg22 = Reg[22];
assign Reg23 = Reg[23];
assign Reg24 = Reg[24];
assign Reg25 = Reg[25];
assign Reg26 = Reg[26];
assign Reg27 = Reg[27];
assign Reg28 = Reg[28];
assign Reg29 = Reg[29];
assign Reg30 = Reg[30];
assign Reg31 = Reg[31];
always @(posedge clk or posedge rst) begin
if (rst) begin
for (i = 0; i < 32; i = i + 1) begin
Reg[i] <= 32'b0;
end
end
else begin
if (RegWrite && (Wt_addr != 5'b0)) begin
Reg[Wt_addr] <= Wt_data;
end
else begin
Reg[Wt_addr] <= Reg[Wt_addr];
end
end
end
assign Rs1_data = Reg[Rs1_addr];
assign Rs2_data = Reg[Rs2_addr];
endmodule
module Reg(
input wire clk,
input wire rst,
input wire CE,
input [31:0] D,
output [31:0] Q
);
reg [31:0] Reg;
initial begin
Reg = 32'd0;
end
always @(posedge clk or posedge rst) begin
if(rst) begin
Reg <= 32'd0;
end
else begin
if(CE) Reg <= D;
else Reg <= Reg;
end
end
assign Q = Reg;
endmodule
仿真关键步骤说明
为了仿真,我们需要建立一个仿真平台 testbench, 在里面实例化 SCPU, ROM 和 RAM 来运行仿真代码
1. testbench 代码:
module testbench(
input wire clk,
input wire rst
);
/* SCPU output */
wire [31:0] Addr_out;
wire [31:0] Data_out;
wire CPU_MIO;
wire MemRW;
wire [31:0] PC_out;
/* RAM output */
wire [31:0] douta;
/* ROM output */
wire [31:0] spo;
SCPU u0(
.clk(clk),
.rst(rst),
.Data_in(douta),
.MIO_ready(CPU_MIO),
.inst_in(spo),
.Addr_out(Addr_out),
.Data_out(Data_out),
.CPU_MIO(CPU_MIO),
.MemRW(MemRW),
.PC_out(PC_out)
);
RAM_B u1(
.clka(~clk),
.wea(MemRW),
.addra(Addr_out[11:2]),
.dina(Data_out),
.douta(douta)
);
ROM_D u2(
.a(PC_out[11:2]),
.spo(spo)
);
endmodule
在 testbench 中, 实例化了 SCPU, ROM 和 RAM 并进行了接线
仿真代码:
ImmGen 仿真
`timescale 1ns/1ps
`define IMM_SEL_WIDTH 2
`define IMM_SEL_I `IMM_SEL_WIDTH'd0
`define IMM_SEL_S `IMM_SEL_WIDTH'd1
`define IMM_SEL_B `IMM_SEL_WIDTH'd2
`define IMM_SEL_J `IMM_SEL_WIDTH'd3
module ImmGen_tb();
reg [1:0] ImmSel;
reg [31:0] inst_field;
wire[31:0] Imm_out;
ImmGen m0 (.ImmSel(ImmSel), .inst_field(inst_field), .Imm_out(Imm_out));
`define LET_INST_BE(inst) \
inst_field = inst; \
#5;
initial begin
$dumpfile("ImmGen.vcd");
$dumpvars(1, ImmGen_tb);
#5;
/* Test for I-Type */
ImmSel = `IMM_SEL_I;
`LET_INST_BE(32'h3E810093); //addi x1, x2, 1000
`LET_INST_BE(32'h00A14093); //xori x1, x2, 10
`LET_INST_BE(32'h00116093); //ori x1, x2, 1
`LET_INST_BE(32'h00017093); //andi x1, x2, 0
`LET_INST_BE(32'h01411093); //slli x1, x2, 20
`LET_INST_BE(32'h00515093); //srli x1, x2, 5
`LET_INST_BE(32'h41815093); //srai x1, x2, 24
`LET_INST_BE(32'hFFF12093); //slti x1, x2, -1
`LET_INST_BE(32'h3FF13093); //sltiu x1, x2, 1023
`LET_INST_BE(32'h0E910083); //lb x1, 233(x2)
#20;
/* Test for S-Type */
ImmSel = `IMM_SEL_S;
`LET_INST_BE(32'hFE110DA3); //sb x1, -5(x2)
`LET_INST_BE(32'h00211023); //sh x2, 0(x2)
`LET_INST_BE(32'h00C0A523); //sw x12, 10(x1)
#20;
/* Test for B-Type */
ImmSel = `IMM_SEL_B;
`LET_INST_BE(32'hFE108AE3); //beq x1, x1, -12
`LET_INST_BE(32'h00211463); //bne x2, x2, 8
`LET_INST_BE(32'h0031CA63); //blt x3, x3, 20
`LET_INST_BE(32'hFE4256E3); //bge x4, x4, -20
#20;
/* Test for J-Type */
ImmSel = `IMM_SEL_J;
`LET_INST_BE(32'hF9DFF06F); //jal x0, -100
`LET_INST_BE(32'h3FE000EF); //jal x1, 1023 NOTE: does ImmGen output 1023?
#50; $finish();
end
endmodule
`timescale 1ns/1ps
`include "Lab4_header.vh"
module SCPU_ctrl_tb();
reg [4:0] OPcode;
reg [2:0] Fun3;
reg Fun7;
reg MIO_ready;
wire [1:0] ImmSel;
wire ALUSrc_B;
wire [1: 0] MemtoReg;
wire Jump;
wire Branch;
wire RegWrite;
wire MemRW;
wire [3:0] ALU_Control;
wire CPU_MIO;
SCPU_ctrls m0 (
.OPcode(OPcode),
.Fun3(Fun3),
.Fun7(Fun7),
.MIO_ready(MIO_ready),
.ImmSel(ImmSel),
.ALUSrc_B(ALUSrc_B),
.MemtoReg(MemtoReg),
.Jump(Jump),
.Branch(Branch),
.RegWrite(RegWrite),
.MemRW(MemRW),
.ALU_Control(ALU_Control),
.CPU_MIO(CPU_MIO)
);
reg [31:0] inst_for_test;
`define LET_INST_BE(inst) \
inst_for_test = inst; \
OPcode = inst_for_test[6:2]; \
Fun3 = inst_for_test[14:12]; \
Fun7 = inst_for_test[30]; \
#50;
initial begin
$dumpfile("SCPU_ctrl.vcd");
$dumpvars(1, SCPU_ctrl_tb);
#5;
MIO_ready = 0;
#5;
`LET_INST_BE(32'h001100B3); //add x1, x2, x1
`LET_INST_BE(32'h400080B3); //sub x1, x1, x0
`LET_INST_BE(32'h002140B3); //xor x1, x2, x2
`LET_INST_BE(32'h002160B3); //or x1, x2, x2
`LET_INST_BE(32'h002170B3); //and x1, x2, x2
`LET_INST_BE(32'h002150B3); //srl x1, x2, x2
`LET_INST_BE(32'h002120B3); //slt x1, x2, x2
`LET_INST_BE(32'h3E810093); //addi x1, x2, 1000
`LET_INST_BE(32'h00A14093); //xori x1, x2, 10
`LET_INST_BE(32'h00116093); //ori x1, x2, 1
`LET_INST_BE(32'h00017093); //andi x1, x2, 0
`LET_INST_BE(32'h00515093); //srli x1, x2, 5
`LET_INST_BE(32'hFFF12093); //slti x1, x2, -1
`LET_INST_BE(32'h00812083); //lw x1, 8(x2)
`LET_INST_BE(32'h00C0A823); //sw x12, 16(x1)
`LET_INST_BE(32'hFE108AE3); //beq x1, x1, -12
`LET_INST_BE(32'hF9DFF06F); //jal x0, -100
`LET_INST_BE(32'h3FE000EF); //jal x1, 1023
#50; $finish();
end
endmodule
SCPU 仿真
module testbench_tb();
reg clk;
reg rst;
testbench m0(.clk(clk), .rst(rst));
initial begin
clk = 1'b0;
rst = 1'b1;
#50;
rst = 1'b0;
end
always #10 clk = ~clk;
endmodule
SCPU 仿真测试/下板代码
j start # 00
dummy:
nop # 04
nop # 08
nop # 0C
nop # 10
nop # 14
nop # 18
nop # 1C
j dummy
start:
beq x0, x0, pass_0
li x31, 0
j dummy
pass_0:
li x1, -1 # x1=FFFFFFFF
xori x3, x1, 1 # x3=FFFFFFFE
add x3, x3, x3 # x3=FFFFFFFC
add x3, x3, x3 # x3=FFFFFFF8
add x3, x3, x3 # x3=FFFFFFF0
add x3, x3, x3 # x3=FFFFFFE0
add x3, x3, x3 # x3=FFFFFFC0
add x3, x3, x3 # x3=FFFFFF80
add x3, x3, x3 # x3=FFFFFF00
add x3, x3, x3 # x3=FFFFFE00
add x3, x3, x3 # x3=FFFFFC00
add x3, x3, x3 # x3=FFFFF800
add x3, x3, x3 # x3=FFFFF000
add x3, x3, x3 # x3=FFFFE000
add x3, x3, x3 # x3=FFFFC000
add x3, x3, x3 # x3=FFFF8000
add x3, x3, x3 # x3=FFFF0000
add x3, x3, x3 # x3=FFFE0000
add x3, x3, x3 # x3=FFFC0000
add x3, x3, x3 # x3=FFF80000
add x3, x3, x3 # x3=FFF00000
add x3, x3, x3 # x3=FFE00000
add x3, x3, x3 # x3=FFC00000
add x3, x3, x3 # x3=FF800000
add x3, x3, x3 # x3=FF000000
add x3, x3, x3 # x3=FE000000
add x3, x3, x3 # x3=FC000000
add x5, x3, x3 # x5=F8000000
add x3, x5, x5 # x3=F0000000
add x4, x3, x3 # x4=E0000000
add x6, x4, x4 # x6=C0000000
add x7, x6, x6 # x7=80000000
ori x8, zero, 1 # x8=00000001
ori x28, zero, 31
srl x29, x7, x28 # x29=00000001
beq x8, x29, pass_1
li x31, 1
j dummy
pass_1:
nop
sub x3, x6, x7 # x3=40000000
sub x4, x7, x3 # x4=40000000
slti x9, x0, 1 # x9=00000001
slt x10, x3, x4
slt x10, x4, x3 # x10=00000000
beq x9, x10, dummy # branch when x3 != x4
srli x29, x3, 30 # x29=00000001
beq x29, x9, pass_2
li x31, 2
j dummy
pass_2:
nop
# Test signed set-less-than
slti x10, x1, 3 # x10=00000001
slt x11, x5, x1 # signed(0xF8000000) < -1
# x11=00000001
slt x12, x1, x3 # x12=00000001
andi x10, x10, 0xff
and x10, x10, x11
and x10, x10, x12 # x10=00000001
li x11, 1
beq x10, x11, pass_3
li x31, 3
j dummy
pass_3:
nop
or x11, x7, x3 # x11=C0000000
beq x11, x6, pass_4
li x31, 4
j dummy
pass_4:
nop
li x18, 0x20 # base addr=0x20
### uncomment instr. below when simulating on venus
# srli x18, x7, 3 # base addr=10000000
sw x5, 0(x18) # mem[0x20]=F8000000
sw x4, 4(x18) # mem[0x24]=40000000
lw x29, 0(x18) # x29=mem[0x20]=F8000000
xor x29, x29, x5 # x29=00000000
sw x6, 0(x18) # mem[0x20]=C0000000
lw x30, 0(x18) # x30=mem[0x20]=C0000000
xor x29, x29, x30 # x29=C0000000
beq x6, x29, pass_5
li x31, 5
j dummy
pass_5:
li x31, 0x666
j dummy
实验结果与分析
仿真结果
1. ImmGen 仿真:
ImmGen 仿真输入 4 种类型的指令,分别输出了不同的立即数
图中的仿真波形与标准波形一致,结果符合要求
2. SCPU_Ctrl 仿真:
SCPU_Ctrl 仿真输入 6 种类型的指令,分别输出了不同的控制信号
例如,对于第一条指令 001100B3
, 应为 add x1, x2, x1
指令
仿真波形中 RegWrite=1 , 代表控制 RegFile 写使能
同时 ALUSrc_B=0 , 表示选择 RegFile 读取的 rs2 作为 ALU 的输入 B
同时 ALU_Control=0 , 表示进行 add 操作
再如,最后一条指令 3FE000EF
对应 jal x1, 1023
指令
此时 RegWrite=1 , 代表控制 RegFile 写使能, 将地址存到 x1 中
同时 Jump=1 , 表示是 J 型指令
经检验,对所有指令输出的控制信号正确,结果符合预期
3. SCPU 仿真:
由于仿真波形过长,只将开头和结尾的波形显示出来
可以看到结尾的仿真波形中,Reg31
的值变成 666
, 说明通过了前面的测试,结果符合预期
Lab 4-3
操作方法与实验步骤
代码设计层次结构图及说明
经过指令拓展后的完整 DataPath 图如下:
图中与之前的 DataPath 主要进行了如下更改:
- 增加了
Branch
信号位数,增加了多路选择器来支持不同 B 型指令 - 增加了
Jump
信号位数,来支持jalr
指令 - 增加了 RAM 写使能的位数,以分别控制 4 个字节的写使能
- 增加了
lui
和auipc
的路径,即增加了 ImmGen 模块的生成类型,并增加了reg_wt_data
的选择类型
源代码
1. ImmGen 代码:
module ImmGen(
input [2:0] ImmSel,
input [31:0] inst_field,
output reg [31:0] Imm_out
);
always @(*) begin
case (ImmSel)
3'b000: // I-type
Imm_out = {{20{inst_field[31]}}, inst_field[31:20]};
3'b001: // S-type
Imm_out = {{20{inst_field[31]}}, inst_field[31:25], inst_field[11:7]}; //
3'b010: // B-type
Imm_out = {{19{inst_field[31]}}, inst_field[31], inst_field[7], inst_field[30:25], inst_field[11:8], 1'b0};
3'b011: // J-type
Imm_out = {{11{inst_field[31]}}, inst_field[31], inst_field[19:12], inst_field[20], inst_field[30:21], 1'b0}; //
3'b100: // U_type
Imm_out = {inst_field[31:12], 12'b0}; // high 20 bit Imm, low 12 bit 0
default: Imm_out = 32'b0; //
endcase
end
endmodule
2. SCPU_Ctrl代码:
module SCPU_ctrls(
input [4:0] OPcode,
input [2:0] Fun3,
input Fun7,
input MIO_ready,
output reg [2:0] ImmSel,
output reg ALUSrc_B,
output reg [2:0] MemtoReg,
output reg [1:0] Jump,
output reg [2:0] Branch,
output reg RegWrite,
output reg MemRW,
output reg [3:0] WHBU,
output reg [3:0] ALU_Control,
output reg CPU_MIO
);
initial begin
ImmSel = 3'b000;
ALUSrc_B = 1'b0;
MemtoReg = 3'b000; // 0: ALU result 1: Load from RAM to reg 2/3: PC4, JAL
MemRW = 1'b0; // write to/read from RAM
WHBU = 4'b0;
Jump = 2'b00;
Branch = 3'b000;
RegWrite = 1'b0;
ALU_Control = 4'b0000;
//CPU_MIO = 1'b0;
end
always @(*) begin
case (OPcode)
5'b01100: begin // R-type
WHBU = 4'b0;
ALUSrc_B = 1'b0;
MemtoReg = 3'b000;
RegWrite = 1'b1;
//ImmSel = 2'b00; // no imm
//MemRW = 1'bx; // no read or write
Jump = 2'b00;
Branch = 3'b000;
case ({Fun3, Fun7})
4'b0000: ALU_Control = 4'b0000; // ADD
4'b0001: ALU_Control = 4'b0001; // SUB
4'b0010: ALU_Control = 4'b0010; // SLL
4'b0100: ALU_Control = 4'b0011; // SLT
4'b0110: ALU_Control = 4'b0100; // SLTU
4'b1000: ALU_Control = 4'b0101; // XOR
4'b1010: ALU_Control = 4'b0110; // SRL
4'b1011: ALU_Control = 4'b0111; // SRA
4'b1100: ALU_Control = 4'b1000; // OR
4'b1110: ALU_Control = 4'b1001; // AND
endcase
end
5'b00000: begin // Load
ALUSrc_B = 1'b1;
MemtoReg = 3'b001;
RegWrite = 1'b1;
ImmSel = 3'b000;
MemRW = 1'b0; // read
case (Fun3)
3'b000: WHBU = 4'b0010; // LB
3'b001: WHBU = 4'b0100; // LH
3'b010: WHBU = 4'b1000; // LW
3'b100: WHBU = 4'b0011; // LBU
3'b101: WHBU = 4'b0101; // LHU
endcase
Jump = 2'b00;
Branch = 3'b000;
ALU_Control = 4'b0000; // ADD for address calculation
end
5'b01000: begin // Store
ALUSrc_B = 1'b1;
//MemtoReg = 2'b01;
MemtoReg = 3'bx;
//RegWrite = 1'b1;
RegWrite = 1'b0;
ImmSel = 3'b001;
MemRW = 1'b1; // write
case (Fun3)
3'b000: WHBU = 4'b0010; // SB
3'b001: WHBU = 4'b0100; // SH
3'b010: WHBU = 4'b1000; // SW
endcase
Jump = 2'b00;
Branch = 3'b000;
ALU_Control = 4'b0000; // ADD for address calculation
end
5'b11000: begin // Branch
WHBU = 4'b0000;
ALUSrc_B = 1'b0;
MemtoReg = 3'bx; // new
RegWrite = 1'b0; //
ImmSel = 3'b010;
//MemRW = 1'bx;
//Branch = 1'b1;
Jump = 2'b00;
case (Fun3)
3'b000: begin Branch = 3'b001; ALU_Control = 4'd1; end // BEQ, do SUB in ALU
3'b001: begin Branch = 3'b010; ALU_Control = 4'd1; end // BNE
3'b100: begin Branch = 3'b011; ALU_Control = 4'd3; end // BLT, do SLT in ALU
3'b101: begin Branch = 3'b100; ALU_Control = 4'd3; end // BGE
3'b110: begin Branch = 3'b101; ALU_Control = 4'd4; end // BLTU, do SLTU in ALU
3'b111: begin Branch = 3'b110; ALU_Control = 4'd4; end // BGEU
//default: begin Branch = 3'b000; ALU_Control = 4'd0; end
endcase
end
5'b11011: begin // JAL
WHBU = 4'b0000;
//ALUSrc_B = 1'b1;
ALUSrc_B = 1'bx;
MemtoReg = 3'b010; // PC + 4
RegWrite = 1'b1; //
ImmSel = 3'b011;
// MemRW = 1'bx;
Jump = 2'b01;
Branch = 3'b000;
end
5'b00100: begin // I-type ALU
WHBU = 4'b0000;
ALUSrc_B = 1'b1;
MemtoReg = 3'b000;
RegWrite = 1'b1;
ImmSel = 3'b000;
//MemRW = 1'bx;
Jump = 2'b00;
Branch = 3'b000;
case (Fun3)
3'b000: ALU_Control = 4'b0000; // ADDI
3'b001: ALU_Control = 4'b0010; // SLLI
3'b010: ALU_Control = 4'b0011; // SLTI
3'b011: ALU_Control = 4'b0100; // SLTIU
3'b100: ALU_Control = 4'b0101; // XORI
3'b101: ALU_Control = (Fun7) ? 4'b0111 : 4'b0110; // SRAI / SRLI
3'b110: ALU_Control = 4'b1000; // ORI
3'b111: ALU_Control = 4'b1001; // ANDI
endcase
end
5'b11001: begin // I-type JALR
WHBU = 4'b0000;
ALUSrc_B = 1'b1;
MemtoReg = 2'b010; // PC + 4
RegWrite = 1'b1; //
ImmSel = 3'b000;
//MemRW = 1'b0;
MemRW = 1'bx;
Jump = 2'b10;
Branch = 3'b000;
ALU_Control = 4'd0; // ADD
end
5'b01101: begin // lui
WHBU = 4'b0000;
ALUSrc_B = 1'bx;
MemtoReg = 3'b11; // lui_res = Imm
RegWrite = 1'b1; //
ImmSel = 3'b100; // U-type
//MemRW = 1'b0;
MemRW = 1'bx;
Jump = 2'b00;
Branch = 3'b000;
ALU_Control = 4'dx; // ADD
end
5'b00101: begin // auipc
WHBU = 4'b0000;
ALUSrc_B = 1'bx;
MemtoReg = 3'b100; // auipc_res = PC + Imm
RegWrite = 1'b1; //
ImmSel = 3'b100;
//MemRW = 1'b0;
MemRW = 1'bx;
Jump = 2'b00;
Branch = 3'b000;
ALU_Control = 4'dx; // ADD
end
endcase
//CPU_MIO = MIO_ready;
end
endmodule
WHBU
这个信号,用来表示 Word, Half, Byte, Unsigned 这四种 Load,Store 的状态
此外还增加了 Branch
, Jump
位数,增加了 U 型指令
3. DataPath代码:
module DataPaths(
input [2:0] ImmSel, input ALUSrc_B, input [2:0] MemtoReg, input [1:0] Jump, input [2:0] Branch, input RegWrite, input [3:0] ALU_Control,
input [31:0] Data_in, input clk, input [31:0] inst_field, input rst, input [3:0] WHBU,
output [31:0] Reg00, output [31:0] Reg01, output [31:0] Reg02, output [31:0] Reg03,
output [31:0] Reg04, output [31:0] Reg05, output [31:0] Reg06, output [31:0] Reg07,
output [31:0] Reg08, output [31:0] Reg09, output [31:0] Reg10, output [31:0] Reg11,
output [31:0] Reg12, output [31:0] Reg13, output [31:0] Reg14, output [31:0] Reg15,
output [31:0] Reg16, output [31:0] Reg17, output [31:0] Reg18, output [31:0] Reg19,
output [31:0] Reg20, output [31:0] Reg21, output [31:0] Reg22, output [31:0] Reg23,
output [31:0] Reg24, output [31:0] Reg25, output [31:0] Reg26, output [31:0] Reg27,
output [31:0] Reg28, output [31:0] Reg29, output [31:0] Reg30, output [31:0] Reg31,
output [31:0] ALU_out, output reg [31:0] Data_out, output [31:0] PC_out,
output reg [3:0] wea);
reg branch;
wire Branch_one;
wire [31:0] branch_out;
wire ALU_zero;
wire ALU_overflow;
wire [31:0] PCAddImm;
wire [31:0] ALU_A;
wire [31:0] ALU_B;
reg [31:0] reg_wt_data;
wire [31:0] PCAdd4;
reg [31:0] PC_in;
wire [31:0] ImmOut;
wire [1:0] LSwea;
reg [31:0] LoadData;
wire [31:0] ReadData;
assign LSwea = ImmOut[1:0];
assign PCAdd4 = PC_out + 32'd4;
assign Branch_one = (Branch == 3'b000) ? 1'b0 : 1'b1;
assign PCAddImm = ImmOut + PC_out; // ALU ? Adder!
always @ (*) begin
case (MemtoReg)
3'd0: reg_wt_data = ALU_out; // ALU
3'd1: begin
case (WHBU)
4'b0010: reg_wt_data = {{24{LoadData[7]}}, LoadData[7:0]}; // LB
4'b0100: reg_wt_data = {{16{LoadData[15]}}, LoadData[15:0]}; // LH
4'b1000: reg_wt_data = LoadData; // LW
4'b0011: reg_wt_data = {24'b0, LoadData[7:0]}; // LBU
4'b0101: reg_wt_data = {16'b0, LoadData[15:0]}; // LHU
default: reg_wt_data = 0;
endcase
end
3'd2: reg_wt_data = PCAdd4; // JAL
3'd3: reg_wt_data = ImmOut; // lui
3'd4: reg_wt_data = PCAddImm; // auipc
default: reg_wt_data = 32'b0;
endcase
end
always @ (*) begin
case (Branch)
3'b001: branch = Branch_one & (ALU_zero); // BEQ
3'b010: branch = Branch_one & (~ALU_zero); // BNE
3'b011: branch = Branch_one & (ALU_out[0]); // BLT
3'b100: branch = Branch_one & (~ALU_out[0]); // BGE
3'b101: branch = Branch_one & (ALU_out[0]); // BLTU
3'b110: branch = Branch_one & (~ALU_out[0]); // BGEU
default: branch = 1'b0;
endcase
case (Jump)
2'b00: PC_in = branch_out;
2'b01: PC_in = PCAddImm;
2'b10: PC_in = ALU_out;
default: PC_in = 32'b0;
endcase
end
always @ (*) begin
case (LSwea)
2'b00: begin
LoadData = Data_in;
Data_out = ReadData;
case (WHBU)
4'b0010: wea = 4'b0001;
4'b0100: wea = 4'b0011;
4'b1000: wea = 4'b1111;
default: wea = 0;
endcase
end
2'b01: begin
LoadData = {8'b0, Data_in[31:8]};
Data_out = {ReadData[23:0], 8'b0};
case (WHBU)
4'b0010: wea = 4'b0010;
4'b0100: wea = 4'b0110;
default: wea = 0;
endcase
end
2'b10: begin
LoadData = {16'b0, Data_in[31:16]};
Data_out = {ReadData[15:0], 16'b0};
case (WHBU)
4'b0010: wea = 4'b0100;
4'b0100: wea = 4'b1100;
default: wea = 0;
endcase
end
2'b11: begin
LoadData = {24'b0, Data_in[31:24]};
Data_out = {ReadData[7:0], 24'b0};
case (WHBU)
4'b0010: wea = 4'b1000;
default: wea = 0;
endcase
end
endcase
end
assign ALU_B = ALUSrc_B ? ImmOut : ReadData;
assign branch_out = branch ? PCAddImm : PCAdd4;
ALU alu(.A(ALU_A), .B(ALU_B), .ALU_operation(ALU_Control), .res(ALU_out),
.zero(ALU_zero), .overflow(ALU_overflow));
wire [4:0] RS1, RS2, WT;
assign RS1 = inst_field[19:15];
assign RS2 = inst_field[24:20];
assign WT = inst_field[11:7];
Regs regs(.clk(clk), .rst(rst), .Rs1_addr(RS1), .Rs2_addr(RS2),
.Wt_addr(WT), .Wt_data(reg_wt_data), .RegWrite(RegWrite),
.Rs1_data(ALU_A), .Rs2_data(ReadData),
.Reg00(Reg00),.Reg01(Reg01),.Reg02(Reg02),.Reg03(Reg03),
.Reg04(Reg04),.Reg05(Reg05),.Reg06(Reg06),.Reg07(Reg07),
.Reg08(Reg08),.Reg09(Reg09),.Reg10(Reg10),.Reg11(Reg11),
.Reg12(Reg12),.Reg13(Reg13),.Reg14(Reg14),.Reg15(Reg15),
.Reg16(Reg16),.Reg17(Reg17),.Reg18(Reg18),.Reg19(Reg19),
.Reg20(Reg20),.Reg21(Reg21),.Reg22(Reg22),.Reg23(Reg23),
.Reg24(Reg24),.Reg25(Reg25),.Reg26(Reg26),.Reg27(Reg27),
.Reg28(Reg28),.Reg29(Reg29),.Reg30(Reg30),.Reg31(Reg31));
ImmGen immgen(.ImmSel(ImmSel), .inst_field(inst_field), .Imm_out(ImmOut));
Reg PC(.clk(clk), .rst(rst), .CE(1'b1), .D(PC_in), .Q(PC_out));
endmodule
同时对于新增的 WHBU
信号更改 RAM 读出,读入的信号:
对于读出到 RAM 的值,用 ReadData
存储 Reg File 读出的值,并根据偏移量使用 Data_out
存储移位后的值,最后传出这个值,写入到 RAM 中
对于读入到寄存器的值,用 Data_in
存储 RAM 读入的值,并进行位移得到 LoadData
, 最后写入 RegFile 中
同时新增一个 4 位使能信号 wea
, 分别控制 RAM 的 4 个字节的读写
仿真关键步骤说明
仿真代码同 Lab4-2
仿真测试代码:
auipc x1, 0
j start # 00
dummy:
nop # 04
nop # 08
nop # 0C
nop # 10
nop # 14
nop # 18
nop # 1C
j dummy
start:
bnez x1, dummy
beq x0, x0, pass_0
li x31, 0
auipc x30, 0
j dummy
pass_0:
li x31, 1
bne x0, x0, dummy
bltu x0, x0, dummy
li x1, -1 # x1=FFFFFFFF
xori x3, x1, 1 # x3=FFFFFFFE
add x3, x3, x3 # x3=FFFFFFFC
add x3, x3, x3 # x3=FFFFFFF8
add x3, x3, x3 # x3=FFFFFFF0
add x3, x3, x3 # x3=FFFFFFE0
add x3, x3, x3 # x3=FFFFFFC0
add x3, x3, x3 # x3=FFFFFF80
add x3, x3, x3 # x3=FFFFFF00
add x3, x3, x3 # x3=FFFFFE00
add x3, x3, x3 # x3=FFFFFC00
add x3, x3, x3 # x3=FFFFF800
add x3, x3, x3 # x3=FFFFF000
add x3, x3, x3 # x3=FFFFE000
add x3, x3, x3 # x3=FFFFC000
add x3, x3, x3 # x3=FFFF8000
add x3, x3, x3 # x3=FFFF0000
add x3, x3, x3 # x3=FFFE0000
add x3, x3, x3 # x3=FFFC0000
add x3, x3, x3 # x3=FFF80000
add x3, x3, x3 # x3=FFF00000
add x3, x3, x3 # x3=FFE00000
add x3, x3, x3 # x3=FFC00000
add x3, x3, x3 # x3=FF800000
add x3, x3, x3 # x3=FF000000
add x3, x3, x3 # x3=FE000000
add x3, x3, x3 # x3=FC000000
add x5, x3, x3 # x5=F8000000
add x3, x5, x5 # x3=F0000000
add x4, x3, x3 # x4=E0000000
add x6, x4, x4 # x6=C0000000
add x7, x6, x6 # x7=80000000
ori x8, zero, 1 # x8=00000001
ori x28, zero, 31
srl x29, x7, x28 # x29=00000001
auipc x30, 0
bne x8, x29, dummy
auipc x30, 0
blt x8, x7, dummy
sra x29, x7, x28 # x29=FFFFFFFF
and x29, x29, x3 # x29=x3=F0000000
auipc x30, 0
bne x3, x29, dummy
mv x29, x8 # x29=x8=00000001
bltu x29, x7, pass_1 # unsigned 00000001 < 80000000
auipc x30, 0
j dummy
pass_1:
nop
li x31, 2
sub x3, x6, x7 # x3=40000000
sub x4, x7, x3 # x4=40000000
slti x9, x0, 1 # x9=00000001
slt x10, x3, x4
slt x10, x4, x3 # x10=00000000
auipc x30, 0
beq x9, x10, dummy # branch when x3 != x4
srli x29, x3, 30 # x29=00000001
beq x29, x9, pass_2
auipc x30, 0
j dummy
pass_2:
nop
# Test set-less-than
li x31, 3
slti x10, x1, 3 # x10=00000001
slt x11, x5, x1 # signed(0xF8000000) < -1
# x11=00000001
slt x12, x1, x3 # x12=00000001
andi x10, x10, 0xff
and x10, x10, x11
and x10, x10, x12 # x10=00000001
auipc x30, 0
beqz x10, dummy
sltu x10, x1, x8 # unsigned FFFFFFFF < 00000001 ?
auipc x30, 0
bnez x10, dummy
sltu x10, x8, x3 # unsigned 00000001 < F0000000 ?
auipc x30, 0
beqz x10, dummy
sltiu x10, x1, 3
auipc x30, 0
bnez x10, dummy
li x11, 1
bne x10, x11, pass_3
auipc x30, 0
j dummy
pass_3:
nop
li x31, 4
or x11, x7, x3 # x11=C0000000
beq x11, x6, pass_4
auipc x30, 0
j dummy
pass_4:
nop
li x31, 5
li x18, 0x20 # base addr=00000020
### uncomment instr. below when simulating on venus
# lui x18, 0x10000 # base addr=10000000
sw x5, 0(x18) # mem[0x20]=F8000000
sw x4, 4(x18) # mem[0x24]=40000000
lw x27, 0(x18) # x27=mem[0x20]=F8000000
xor x27, x27, x5 # x27=00000000
sw x6, 0(x18) # mem[0x20]=C0000000
lw x28, 0(x18) # x28=mem[0x20]=C0000000
xor x27, x6, x28 # x27=00000000
auipc x30, 0
bnez x27, dummy
lui x20, 0xA0000 # x20=A0000000
sw x20, 8(x18) # mem[0x28]=A0000000
lui x27, 0xFEDCB # x27=FEDCB000
srai x27, x27, 12 # x27=FFFFEDCB
li x28, 8
sll x27, x27, x28 # x27=FFEDCB00
ori x27, x27, 0xff # x27=FFEDCBFF
lb x29, 11(x18) # x29=FFFFFFA0, little-endian, signed-ext
and x27, x27, x29 # x27=FFEDCBA0
sw x27, 8(x18) # mem[0x28]=FFEDCBA0
lhu x27, 8(x18) # x27=0000CBA0
lui x20, 0xFFFF0 # x20=FFFF0000
and x20, x20, x27 # x20=00000000
auipc x30, 0
bnez x20, dummy # check unsigned-ext
li x31, 6
lbu x28, 10(x18) # x28=000000ED
lbu x29, 11(x18) # x29=000000FF
slli x29, x29, 8 # x29=0000FF00
or x29, x29, x28 # x29=0000FFED
slli x29, x29, 16
or x29, x27, x29 # x29=FFEDCBA0
lw x28, 8(x18) # x28=FFEDCBA0
auipc x30, 0
bne x28, x29, dummy
sw x0, 0(x18) # mem[0x20]=00000000
sh x27, 0(x18) # mem[0x20]=0000CBA0
li x28, 0xD0
sb x28, 2(x18) # mem[0x20]=00D0CBA0
lw x28, 0(x18) # x28=00D0CBA0
li x29, 0x00D0CBA0
auipc x30, 0
bne x28, x29, dummy
lh x27, 2(x18) # x27=000000D0
li x28, 0xD0
auipc x30, 0
bne x27, x28, dummy
pass_5:
li x31, 7
auipc x30, 0
bge x1, x0, dummy # -1 >= 0 ?
bge x8, x1, pass_6 # 1 >= -1 ?
auipc x30, 0
j dummy
pass_6:
auipc x30, 0
bgeu x0, x1, dummy # 0 >= FFFFFFFF ?
auipc x30, 0
bgeu x8, x1, dummy
auipc x20, 0
jalr x21, x0, pass_7 # just for test : (
auipc x30, 0
j dummy
pass_7:
# jalr ->
addi x20, x20, 8
auipc x30, 0
bne x20, x21, dummy
li x31, 0x666
j dummy
在 Lab4-2 的基础上添加了拓展出的指令,同样在通过测试后进入 Dummy 循环
实验结果与分析
仿真结果
由于仿真波形过长,只显示开头和结尾的波形
可以看到,Reg31
的值被改为 666
, 说明通过了前面的测试,进入了 Dummy 循环,结果符合预期
Lab 4-4
操作方法与实验步骤
代码设计及说明
1. CSR寄存器及其指令
实现 5 个异常寄存器:
mstatus: Machine Status Register,存储当前控制状态。将第 3 位的
MIE
置为1
, 表示当前已经进入异常/中断处理mtvec: Machine Trap-Vector Base-Address Register,存储中断向量表基地址
mcause: Machine Cause Register,存储引起这次 trap 的原因。 如果进入 trap 的原因是中断,则最高位 interrupt bit 设置为 1,若为异常则设为 0。
mtval: Machine Trap Value Register,存储异常的相关信息以帮助软件处理异常,曾称 mbadaddr。
mepc: Machine Exception Program Counter,存储 trap 触发时将要执行的指令地址,在 mret 时作为返回地址。
以及 6 中指令:csrrw,csrrs,csrrc,csrrwi,csrrsi,csrrci
, 来直接更改这 5 个寄存器的值
为此,要实现一个 CSRRegs 模块来管理这些寄存器
2. 异常中断处理
需要实现两种指令:
ecall: 软件中断指令
mret: 异常中断返回指令
同时要新增外部中断信号,实现硬件中断
为此,要实现一个 RV_INT 模块来实现产生输入到 CSRRegs 模块中的旁路输入,同时产生 pc 的更改信号
还要更改 SCPU_Ctrl 来实现指令识别和中断信号的产生
3. trap 程序
当触发异常中断处理后,要进入 trap 程序
在程序中,需要读出 mepc, mscause, mtval, mstatus, mtvec
的值,放在某个寄存器当中。为了防止通用寄存器中的有效值丢失,选择在之前代码里未被使用的寄存器。
之后将 mepc
读出,处理 mepc
:
对于异常(非法指令),mepc <- mepc + 4
。
对于中断,如果是软件中断 ecall
,mepc <- mepc + 4
。
如果是硬件中断,mepc <- mepc
。(使用 mcause 进行区分)
最后调用 mret 返回到原来的程序。(此时要恢复进入处理程序所保存的信息)
源代码
1. CSRRegs 代码:
module CSRRegs(
input clk,
input rst,
input [11:0] raddr,
input [11:0] waddr,
input [31:0] wdata,
input csr_w,
input [1:0] csr_wsc_mode,
input expt_int,
input [31:0] mepc_bypass_in,
input [31:0] mcause_bypass_in,
input [31:0] mtval_bypass_in,
input [31:0] mstatus_bypass_in,
output reg [31:0] rdata,
output reg [31:0] mstatus,
output reg [31:0] mtvec,
output reg [31:0] mepc,
output reg [31:0] mcause,
output reg [31:0] mtval
);
//reg [31:0] csr [4095:0];
localparam CSR_WSC_WRITE = 2'b00;
localparam CSR_WSC_SET = 2'b01;
localparam CSR_WSC_CLEAR = 2'b10;
localparam CSR_MSTATUS = 12'h300;
localparam CSR_MTVEC = 12'h305;
localparam CSR_MEPC = 12'h341;
localparam CSR_MCAUSE = 12'h342;
localparam CSR_MTVAL = 12'h343;
always @(*) begin
case (raddr)
CSR_MSTATUS: rdata = mstatus;
CSR_MTVEC: rdata = mtvec;
CSR_MEPC: rdata = mepc;
CSR_MCAUSE: rdata = mcause;
CSR_MTVAL: rdata = mtval;
default: rdata = 32'd0;
endcase
end
always @(posedge clk or posedge rst) begin
if (rst) begin
mstatus <= 32'd8;
mtvec <= 32'h7C;
mepc <= 32'b0;
mcause <= 32'b0;
mtval <= 32'b0;
end else begin
if (expt_int) begin
mepc <= mepc_bypass_in;
mcause <= mcause_bypass_in;
mtval <= mtval_bypass_in;
mstatus <= mstatus_bypass_in;
end else if (csr_w) begin
case (waddr)
CSR_MSTATUS: begin
case (csr_wsc_mode)
CSR_WSC_WRITE: mstatus <= wdata;
CSR_WSC_SET: mstatus <= mstatus | wdata;
CSR_WSC_CLEAR: mstatus <= mstatus & ~wdata;
default: mstatus <= mstatus;
endcase
end
CSR_MTVEC: begin
case (csr_wsc_mode)
CSR_WSC_WRITE: mtvec <= wdata;
CSR_WSC_SET: mtvec <= mtvec | wdata;
CSR_WSC_CLEAR: mtvec <= mtvec & ~wdata;
default: mtvec <= mtvec;
endcase
end
CSR_MEPC: begin
case (csr_wsc_mode)
CSR_WSC_WRITE: mepc <= wdata;
CSR_WSC_SET: mepc <= mepc | wdata;
CSR_WSC_CLEAR: mepc <= mepc & ~wdata;
default: mepc <= mepc;
endcase
end
CSR_MCAUSE: begin
case (csr_wsc_mode)
CSR_WSC_WRITE: mcause <= wdata;
CSR_WSC_SET: mcause <= mcause | wdata;
CSR_WSC_CLEAR: mcause <= mcause & ~wdata;
default: mcause <= mcause;
endcase
end
CSR_MTVAL: begin
case (csr_wsc_mode)
CSR_WSC_WRITE: mtval <= wdata;
CSR_WSC_SET: mtval <= mtval | wdata;
CSR_WSC_CLEAR: mtval <= mtval & ~wdata;
default: mtval <= mtval;
endcase
end
default: mtval <= mtval;
endcase
end
end
end
endmodule
在这个模块中,主要实现了利用 6 种 CSR 指令对于寄存器进行修改;同时,当触发异常中断,使用旁路输入同时更改所有的寄存器
2. RV_INT 代码:
module RV_INT (
input clk,
input rst,
input INT,
input ecall,
input mret,
input illegal_inst,
input [31:0] mstatus,
input [31:0] mtvec,
input [31:0] mepc,
input [31:0] inst,
input [31:0] pc_current,
output en,
output reg expt_int,
output reg pc_change,
output reg [31:0] mepc_bypass_in,
output reg [31:0] mcause_bypass_in,
output reg [31:0] mtval_bypass_in,
output reg [31:0] mstatus_bypass_in,
output reg [31:0] pc
);
localparam MCAUSE_INT_EXTERNAL = 32'h8000000B;
localparam MCAUSE_EXC_ECALL = 32'h0000000B; // ECALL
localparam MCAUSE_EXC_ILLEGAL = 32'h00000002;
// localparam MCAUSE_EXC_L_ACCESS = 32'h00000005;
// localparam MCAUSE_EXC_J_ACCESS = 32'h00000007;
localparam MSTATUS_ENABLE = 32'h00000008;
localparam MSTATUS_UNABLE = 32'h00000000;
// CSRRegs
wire [31:0] csr_rdata;
reg [31:0] csr_wdata;
reg [11:0] csr_raddr, csr_waddr;
reg csr_w;
reg [1:0] csr_wsc_mode;
always @(*) begin
if(rst) begin
expt_int = 1'b0;
mepc_bypass_in = 32'b0;
mcause_bypass_in = 32'b0;
mtval_bypass_in = 32'b0;
mstatus_bypass_in = MSTATUS_ENABLE;
pc = pc_current;
pc_change = 1'b0;
end
else begin
if(mstatus == MSTATUS_ENABLE) begin // enabled
if (INT) begin
expt_int = 1'b1;
pc = mtvec;
pc_change = 1'b1;
mcause_bypass_in = MCAUSE_INT_EXTERNAL;
mstatus_bypass_in = MSTATUS_UNABLE;
mepc_bypass_in = pc_current;
mtval_bypass_in = 32'b0;
end else if (ecall) begin
expt_int = 1'b1;
pc = mtvec;
pc_change = 1'b1;
mcause_bypass_in = MCAUSE_EXC_ECALL;
mstatus_bypass_in = MSTATUS_UNABLE;
mepc_bypass_in = pc_current;
mtval_bypass_in = 32'b0;
end else if (illegal_inst) begin
expt_int = 1'b1;
pc = mtvec;
pc_change = 1'b1;
mcause_bypass_in = MCAUSE_EXC_ILLEGAL;
mstatus_bypass_in = MSTATUS_UNABLE;
mepc_bypass_in = pc_current;
mtval_bypass_in = inst;
end
else begin
expt_int = 1'b0;
pc_change = 1'b0;
end
end
else begin
if (mret) begin
expt_int = 1'b1;
pc = mepc;
pc_change = 1'b1;
mcause_bypass_in = 32'b0;
mstatus_bypass_in = MSTATUS_ENABLE; // clear mark
mepc_bypass_in = 32'b0;
mtval_bypass_in = 32'b0;
end
else begin
expt_int = 1'b0;
pc_change = 1'b0;
end
end
end
end
assign en = ~expt_int;
endmodule
其中 mcause
的值采用了标准的值,即:
localparam MCAUSE_INT_EXTERNAL = 32'h8000000B;
localparam MCAUSE_EXC_ECALL = 32'h0000000B; // ECALL
localparam MCAUSE_EXC_ILLEGAL = 32'h00000002;
最高位为 1 代表是外部中断
3. SCPU_Ctrl 代码:
module SCPU_ctrls(
input [4:0] OPcode,
input [2:0] Fun3,
input Fun7,
input MIO_ready,
input [6:0] High7,
output reg [2:0] ImmSel,
output reg ALUSrc_B,
output reg [2:0] MemtoReg,
output reg [1:0] Jump,
output reg [2:0] Branch,
output reg RegWrite,
output reg MemRW,
output reg [3:0] WHBU,
output reg [3:0] ALU_Control,
output reg csr_w,
output reg is_csri,
output reg [1:0] csr_wsc_mode,
output reg mret,
output reg ecall,
output reg illegal,
output reg CPU_MIO
);
initial begin
ImmSel = 3'b000;
ALUSrc_B = 1'b0;
MemtoReg = 3'b000; // 0: ALU result 1: Load from RAM to reg 2/3: PC4, JAL
MemRW = 1'b0; // write to/read from RAM?
WHBU = 4'b0;
Jump = 2'b00;
Branch = 3'b000;
RegWrite = 1'b0;
ALU_Control = 4'b0000;
//CPU_MIO = 1'b0;
end
always @(*) begin
case (OPcode)
5'b01100: begin // R-type
is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
illegal = 1'b0;
WHBU = 4'b0;
ALUSrc_B = 1'b0;
MemtoReg = 3'b000;
RegWrite = 1'b1;
//ImmSel = 2'b00; // no imm
//MemRW = 1'bx; // no read or write
Jump = 2'b00;
Branch = 3'b000;
case ({Fun3, Fun7})
4'b0000: ALU_Control = 4'b0000; // ADD
4'b0001: ALU_Control = 4'b0001; // SUB
4'b0010: ALU_Control = 4'b0010; // SLL
4'b0100: ALU_Control = 4'b0011; // SLT
4'b0110: ALU_Control = 4'b0100; // SLTU
4'b1000: ALU_Control = 4'b0101; // XOR
4'b1010: ALU_Control = 4'b0110; // SRL
4'b1011: ALU_Control = 4'b0111; // SRA
4'b1100: ALU_Control = 4'b1000; // OR
4'b1110: ALU_Control = 4'b1001; // AND
default: ALU_Control = 4'b0000;
endcase
end
5'b11100: begin // I-type csr
illegal = 1'b0;
WHBU = 4'b0;
ALUSrc_B = 1'b0;
MemtoReg = 3'b101;
RegWrite = 1'b1; // write into rd
ImmSel = 3'b101;
Jump = 2'b00;
Branch = 3'b000;
csr_w = 1'b1;
case (Fun3)
3'b000: begin
//expt_int = 1'b1;
if (High7 == 7'b0011000) begin // mret
mret = 1'b1;
ecall = 1'b0;
end
else begin // ecall
mret = 1'b0;
ecall = 1'b1;
end
end
3'b001: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;end// csrrw
3'b010: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b01;end// csrrs
3'b011: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b10;end// csrrc
3'b101: begin is_csri = 1'b1; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;end// csrrwi
3'b110: begin is_csri = 1'b1; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b01;end// csrrsi
3'b111: begin is_csri = 1'b1; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b10;end// csrrci
default: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;end
endcase
end
5'b00000: begin // Load
is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
illegal = 1'b0;
ALUSrc_B = 1'b1;
MemtoReg = 3'b001;
RegWrite = 1'b1;
ImmSel = 3'b000;
MemRW = 1'b0; // read
case (Fun3)
3'b000: WHBU = 4'b0010; // LB
3'b001: WHBU = 4'b0100; // LH
3'b010: WHBU = 4'b1000; // LW
3'b100: WHBU = 4'b0011; // LBU
3'b101: WHBU = 4'b0101; // LHU
default: WHBU = 4'b0000;
endcase
Jump = 2'b00;
Branch = 3'b000;
ALU_Control = 4'b0000; // ADD for address calculation
end
5'b01000: begin // Store
is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
illegal = 1'b0;
ALUSrc_B = 1'b1;
//MemtoReg = 2'b01;
MemtoReg = 3'bx;
//RegWrite = 1'b1;
RegWrite = 1'b0;
ImmSel = 3'b001;
MemRW = 1'b1; // write
case (Fun3)
3'b000: WHBU = 4'b0010; // SB
3'b001: WHBU = 4'b0100; // SH
3'b010: WHBU = 4'b1000; // SW
default: WHBU = 4'b0000;
endcase
Jump = 2'b00;
Branch = 3'b000;
ALU_Control = 4'b0000; // ADD for address calculation
end
5'b11000: begin // Branch
is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
illegal = 1'b0;
WHBU = 4'b0000;
ALUSrc_B = 1'b0;
MemtoReg = 3'bx; // new
RegWrite = 1'b0; //
ImmSel = 3'b010;
//MemRW = 1'bx;
//Branch = 1'b1;
Jump = 2'b00;
case (Fun3)
3'b000: begin Branch = 3'b001; ALU_Control = 4'd1; end // BEQ, do SUB in ALU
3'b001: begin Branch = 3'b010; ALU_Control = 4'd1; end // BNE
3'b100: begin Branch = 3'b011; ALU_Control = 4'd3; end // BLT, do SLT in ALU
3'b101: begin Branch = 3'b100; ALU_Control = 4'd3; end // BGE
3'b110: begin Branch = 3'b101; ALU_Control = 4'd4; end // BLTU, do SLTU in ALU
3'b111: begin Branch = 3'b110; ALU_Control = 4'd4; end // BGEU
default: begin Branch = 3'b000; ALU_Control = 4'd0; end
endcase
end
5'b11011: begin // JAL
is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
illegal = 1'b0;
WHBU = 4'b0000;
//ALUSrc_B = 1'b1;
ALUSrc_B = 1'bx;
MemtoReg = 3'b010; // PC + 4
RegWrite = 1'b1; //
ImmSel = 3'b011;
// MemRW = 1'bx;
Jump = 2'b01;
Branch = 3'b000;
end
5'b00100: begin // I-type ALU
illegal = 1'b0;
WHBU = 4'b0000;
ALUSrc_B = 1'b1;
MemtoReg = 3'b000;
RegWrite = 1'b1;
ImmSel = 3'b000;
//MemRW = 1'bx;
Jump = 2'b00;
Branch = 3'b000;
case (Fun3)
3'b000: ALU_Control = 4'b0000; // ADDI
3'b001: ALU_Control = 4'b0010; // SLLI
3'b010: ALU_Control = 4'b0011; // SLTI
3'b011: ALU_Control = 4'b0100; // SLTIU
3'b100: ALU_Control = 4'b0101; // XORI
3'b101: ALU_Control = (Fun7) ? 4'b0111 : 4'b0110; // SRAI / SRLI
3'b110: ALU_Control = 4'b1000; // ORI
3'b111: ALU_Control = 4'b1001; // ANDI
default: ALU_Control = 4'b0000;
endcase
end
5'b11001: begin // I-type JALR
is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
illegal = 1'b0;
WHBU = 4'b0000;
ALUSrc_B = 1'b1;
MemtoReg = 2'b010; // PC + 4
RegWrite = 1'b1; //
ImmSel = 3'b000;
//MemRW = 1'b0;
MemRW = 1'bx;
Jump = 2'b10;
Branch = 3'b000;
ALU_Control = 4'd0; // ADD
end
5'b01101: begin // lui
is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
illegal = 1'b0;
WHBU = 4'b0000;
ALUSrc_B = 1'bx;
MemtoReg = 3'b11; // lui_res = Imm
RegWrite = 1'b1; //
ImmSel = 3'b100; // U-type
MemRW = 1'bx;
Jump = 2'b00;
Branch = 3'b000;
ALU_Control = 4'dx; // ADD
end
5'b00101: begin // auipc
is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;
illegal = 1'b0;
WHBU = 4'b0000;
ALUSrc_B = 1'bx;
MemtoReg = 3'b100; // auipc_res = PC + Imm
RegWrite = 1'b1; //
ImmSel = 3'b100;
MemRW = 1'bx;
Jump = 2'b00;
Branch = 3'b000;
ALU_Control = 4'dx; // ADD
end
default: illegal = 1'b1; // illegal instruction
endcase
end
endmodule
更改的部分为:
always @(*) begin
case(OPcode):
5'b11100: begin // I-type csr
illegal = 1'b0;
WHBU = 4'b0;
ALUSrc_B = 1'b0;
MemtoReg = 3'b101;
RegWrite = 1'b1; // write into rd
ImmSel = 3'b101;
Jump = 2'b00;
Branch = 3'b000;
csr_w = 1'b1;
case (Fun3)
3'b000: begin
if (High7 == 7'b0011000) begin // mret
mret = 1'b1;
ecall = 1'b0;
end
else begin // ecall
mret = 1'b0;
ecall = 1'b1;
end
end
3'b001: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;end// csrrw
3'b010: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b01;end// csrrs
3'b011: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b10;end// csrrc
3'b101: begin is_csri = 1'b1; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;end// csrrwi
3'b110: begin is_csri = 1'b1; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b01;end// csrrsi
3'b111: begin is_csri = 1'b1; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b10;end// csrrci
default: begin is_csri = 1'b0; mret = 1'b0; ecall = 1'b0; csr_wsc_mode = 2'b00;end
endcase
end
endcase
end
4. DataPath 代码:
module DataPaths(
input [2:0] ImmSel, input ALUSrc_B, input [2:0] MemtoReg, input [1:0] Jump,
input [2:0] Branch, input RegWrite, input [3:0] ALU_Control, input is_csri,
input [1:0] csr_wsc_mode,
input csr_w,
input illegal, input mret, input ecall,
input INT, // external interruption
input [31:0] Data_in, input clk, input [31:0] inst_field, input rst, input [3:0] WHBU,
output [31:0] Reg00, output [31:0] Reg01, output [31:0] Reg02, output [31:0] Reg03,
output [31:0] Reg04, output [31:0] Reg05, output [31:0] Reg06, output [31:0] Reg07,
output [31:0] Reg08, output [31:0] Reg09, output [31:0] Reg10, output [31:0] Reg11,
output [31:0] Reg12, output [31:0] Reg13, output [31:0] Reg14, output [31:0] Reg15,
output [31:0] Reg16, output [31:0] Reg17, output [31:0] Reg18, output [31:0] Reg19,
output [31:0] Reg20, output [31:0] Reg21, output [31:0] Reg22, output [31:0] Reg23,
output [31:0] Reg24, output [31:0] Reg25, output [31:0] Reg26, output [31:0] Reg27,
output [31:0] Reg28, output [31:0] Reg29, output [31:0] Reg30, output [31:0] Reg31,
output [31:0] ALU_out, output reg [31:0] Data_out, output [31:0] PC_out,
output wire [3:0] out_wea);
reg branch;
wire Branch_one;
wire [31:0] branch_out;
wire ALU_zero;
wire ALU_overflow;
wire [31:0] PCAddImm;
wire [31:0] ALU_A;
wire [31:0] ALU_B;
reg [31:0] reg_wt_data;
wire [31:0] PCAdd4;
reg [31:0] PC_in;
wire [31:0] ImmOut;
wire [1:0] LSwea;
wire expt_int;
reg [3:0] wea;
wire [31:0] rdata;
reg [31:0] LoadData;
wire [31:0] ReadData;
wire [31:0] mepc_bypass_in;
wire [31:0] mcause_bypass_in;
wire [31:0] mtval_bypass_in;
wire [31:0] mstatus_bypass_in;
assign LSwea = ImmOut[1:0];
assign PCAdd4 = PC_out + 32'd4;
assign Branch_one = (Branch == 3'b000) ? 1'b0 : 1'b1;
assign PCAddImm = ImmOut + PC_out; // ALU ? Adder!
always @ (*) begin
case (MemtoReg)
3'd0: reg_wt_data = ALU_out; // ALU
3'd1: begin
case (WHBU)
4'b0010: reg_wt_data = {{24{LoadData[7]}}, LoadData[7:0]}; // LB
4'b0100: reg_wt_data = {{16{LoadData[15]}}, LoadData[15:0]}; // LH
4'b1000: reg_wt_data = LoadData; // LW
4'b0011: reg_wt_data = {24'b0, LoadData[7:0]}; // LBU
4'b0101: reg_wt_data = {16'b0, LoadData[15:0]}; // LHU
default: reg_wt_data = 0;
endcase
end
3'd2: reg_wt_data = PCAdd4; // JAL
3'd3: reg_wt_data = ImmOut; // lui
3'd4: reg_wt_data = PCAddImm; // auipc
3'd5: reg_wt_data = rdata; // csr
default: reg_wt_data = 32'b0;
endcase
end
always @ (*) begin
case (Branch)
3'b001: branch = Branch_one & (ALU_zero); // BEQ
3'b010: branch = Branch_one & (~ALU_zero); // BNE
3'b011: branch = Branch_one & (ALU_out[0]); // BLT
3'b100: branch = Branch_one & (~ALU_out[0]); // BGE
3'b101: branch = Branch_one & (ALU_out[0]); // BLTU
3'b110: branch = Branch_one & (~ALU_out[0]); // BGEU
default: branch = 1'b0;
endcase
case (Jump)
2'b00: PC_in = branch_out;
2'b01: PC_in = PCAddImm;
2'b10: PC_in = ALU_out;
default: PC_in = 32'b0;
//2'b11: PC_in = ALU_out;
endcase
end
always @ (*) begin
case (LSwea)
2'b00: begin
LoadData = Data_in;
Data_out = ReadData;
case (WHBU)
4'b0010: wea = 4'b0001;
4'b0100: wea = 4'b0011;
4'b1000: wea = 4'b1111;
default: wea = 0;
endcase
end
2'b01: begin
LoadData = {8'b0, Data_in[31:8]};
Data_out = {ReadData[23:0], 8'b0};
case (WHBU)
4'b0010: wea = 4'b0010;
4'b0100: wea = 4'b0110;
//4'b1000: wea = 4'b1111;
default: wea = 0;
endcase
end
2'b10: begin
LoadData = {16'b0, Data_in[31:16]};
Data_out = {ReadData[15:0], 16'b0};
case (WHBU)
4'b0010: wea = 4'b0100;
4'b0100: wea = 4'b1100;
//4'b1000: wea = 4'b1111;
default: wea = 0;
endcase
end
2'b11: begin
LoadData = {24'b0, Data_in[31:24]};
Data_out = {ReadData[23:0], 24'b0};
case (WHBU)
4'b0010: wea = 4'b1000;
//4'b0100: wea = 4'b1100;
//4'b1000: wea = 4'b1111;
default: wea = 0;
endcase
end
endcase
end
wire [31:0] wdata;
assign ALU_B = ALUSrc_B ? ImmOut : ReadData;
assign branch_out = branch ? PCAddImm : PCAdd4;
assign wdata = is_csri ? ImmOut : ALU_A; // uimm or rs1
wire pc_change;
wire en;
ALU alu(.A(ALU_A), .B(ALU_B), .ALU_operation(ALU_Control), .res(ALU_out),
.zero(ALU_zero), .overflow(ALU_overflow));
wire [4:0] RS1, RS2, WT;
assign RS1 = inst_field[19:15];
assign RS2 = inst_field[24:20];
assign WT = inst_field[11:7];
Regs regs(.clk(clk), .rst(rst), .Rs1_addr(RS1), .Rs2_addr(RS2),
.Wt_addr(WT), .Wt_data(reg_wt_data), .RegWrite(RegWrite & en),
.Rs1_data(ALU_A), .Rs2_data(ReadData),
.Reg00(Reg00),.Reg01(Reg01),.Reg02(Reg02),.Reg03(Reg03),
.Reg04(Reg04),.Reg05(Reg05),.Reg06(Reg06),.Reg07(Reg07),
.Reg08(Reg08),.Reg09(Reg09),.Reg10(Reg10),.Reg11(Reg11),
.Reg12(Reg12),.Reg13(Reg13),.Reg14(Reg14),.Reg15(Reg15),
.Reg16(Reg16),.Reg17(Reg17),.Reg18(Reg18),.Reg19(Reg19),
.Reg20(Reg20),.Reg21(Reg21),.Reg22(Reg22),.Reg23(Reg23),
.Reg24(Reg24),.Reg25(Reg25),.Reg26(Reg26),.Reg27(Reg27),
.Reg28(Reg28),.Reg29(Reg29),.Reg30(Reg30),.Reg31(Reg31));
wire [11:0] addr;
wire [31:0] mstatus;
wire [31:0] mtvec;
wire [31:0] mepc;
assign addr = inst_field[31:20];
CSRRegs csrregs(.clk(clk), .rst(rst), .raddr(addr),
.waddr(addr), .wdata(wdata), .rdata(rdata), .csr_w(csr_w),
.csr_wsc_mode(csr_wsc_mode), .expt_int(expt_int), .mepc_bypass_in(mepc_bypass_in),
.mcause_bypass_in(mcause_bypass_in), .mtval_bypass_in(mtval_bypass_in), .mstatus_bypass_in(mstatus_bypass_in),
.mstatus(mstatus), .mtvec(mtvec), .mepc(mepc)
);
wire [3:0] EN;
assign EN = {en, en, en, en};
assign out_wea = wea & EN;
wire [31:0] INTPC;
RV_INT rv_int(.clk(clk), .rst(rst), .INT(INT), .expt_int(expt_int), .ecall(ecall), .mret(mret),
.illegal_inst(illegal), .pc_current(PC_out), .en(en), .pc(INTPC), .pc_change(pc_change),
.mepc_bypass_in(mepc_bypass_in), .mcause_bypass_in(mcause_bypass_in),
.mtval_bypass_in(mtval_bypass_in), .mstatus_bypass_in(mstatus_bypass_in),
.mstatus(mstatus), .mtvec(mtvec), .mepc(mepc), .inst(inst_field));
ImmGen immgen(.ImmSel(ImmSel), .inst_field(inst_field), .Imm_out(ImmOut));
wire [31:0] PC_IN;
assign PC_IN = pc_change ? INTPC : PC_in;
Reg PC(.clk(clk), .rst(rst), .CE(1'b1), .D(PC_IN), .Q(PC_out));
endmodule
wire [31:0] wdata;
assign wdata = is_csri ? ImmOut : ALU_A; // uimm or rs1
wire pc_change;
wire [11:0] addr;
wire [31:0] mstatus;
wire [31:0] mtvec;
wire [31:0] mepc;
assign addr = inst_field[31:20];
CSRRegs csrregs(.clk(clk), .rst(rst), .raddr(addr),
.waddr(addr), .wdata(wdata), .rdata(rdata), .csr_w(csr_w),
.csr_wsc_mode(csr_wsc_mode), .expt_int(expt_int), .mepc_bypass_in(mepc_bypass_in),
.mcause_bypass_in(mcause_bypass_in), .mtval_bypass_in(mtval_bypass_in), .mstatus_bypass_in(mstatus_bypass_in),
.mstatus(mstatus), .mtvec(mtvec), .mepc(mepc)
);
wire [31:0] INTPC;
RV_INT rv_int(.clk(clk), .rst(rst), .INT(INT), .expt_int(expt_int), .ecall(ecall), .mret(mret),
.illegal_inst(illegal), .pc_current(PC_out), .en(en), .pc(INTPC), .pc_change(pc_change),
.mepc_bypass_in(mepc_bypass_in), .mcause_bypass_in(mcause_bypass_in),
.mtval_bypass_in(mtval_bypass_in), .mstatus_bypass_in(mstatus_bypass_in),
.mstatus(mstatus), .mtvec(mtvec), .mepc(mepc), .inst(inst_field));
wire [31:0] PC_IN;
assign PC_IN = pc_change ? INTPC : PC_in;
Reg PC(.clk(clk), .rst(rst), .CE(1'b1), .D(PC_IN), .Q(PC_out));
仿真关键步骤说明
仿真代码同 Lab4-2
2. 仿真测试代码:
j start # 00
dummy:
nop # 04
nop # 08
nop # 0C
nop # 10
nop # 14
nop # 18
nop # 1C
j dummy
start:
addi x1, x0, 1
add x2, x1, x0
add x3, x2, x1
add x4, x3, x2
add x5, x4, x3
add x6, x5, x4
addi x7, x0, 0x7c
csrrs x0, 0x305, x7
csrrc x0, 0x341, x3
csrrci x0, 0x342, x4
csrrsi x0, 0x343, x5
csrrwi x21, 0x300, x8
# Here will be an illegal instruction
add x7, x6, x5
add x8, x7, x6
add x9, x8, x7
ecall
add x10, x9, x8
add x11, x10, x9
add x12, x11, x10
pass_1:
li x31, 0x666
j dummy
trap:
csrrs x21, 0x300, x0 # mstatus
csrrs x22, 0x305, x0 # mtvec
csrrs x23, 0x341, x0 # mepc
csrrs x24, 0x342, x0 # mcause
csrrs x25, 0x343, x0 # mtval
lui x26, 0x80000
addi x26, x26, 0x00B
beq x24, x26, return
illegal_ecall:
addi x23, x23, 4
csrrw x0, 0x341, x23
beq x0, x0, return
return:
add x0, x0, x0
这是基本的测试 CSR 指令的代码,并且包含了 trap 代码
在仿真波形测试中,会加入硬件中断, 非法指令和软件中断
实验结果与分析
仿真结果
可以看到,再出现第一个非法指令 ffffffff
后,程序保存了 mepc=54
并跳转到了 trap 程序
在 trap 里面用寄存器保存了五个异常寄存器的值,并更改 mepc=58
,因为不是硬件中断,所以要将 mepc = mepc + 4
之后程序返回到了 pc=58
并继续运行
之后,程序遇到了 ecall
指令,同样保存了 mepc=64
并跳转到了 trap 程序
在 trap 里面用寄存器保存了五个异常寄存器的值,并更改 mepc=68
,因为不是硬件中断,所以要将 mepc = mepc + 4
在这期间,INT
被置为 1
, 表示外部中断被打开,但是程序没有触发新的中断,因为当前已经在中断中了,这一表现符合不触发新中断的要求
之后程序返回到了 pc=68
并继续运行
之后程序遇到了另一个外部中断,同样保存了 mepc=6c
并跳转到了 trap 程序
在 trap 里面用寄存器保存了五个异常寄存器的值,并更改 mepc=6c
,因为是硬件中断,所以要将 mepc = mepc
之后程序返回到了 pc=6c
并继续运行
之后运行到 pc=78
, 主程序结束,Reg31=666
并且进入 dummy 循环,说明测试通过,结果符合预期
思考题
在涉及到一个大立即数的读入时,我们经常能想到使用
lui & addi
来实现,比如下面这段代码就将0x22223333
赋给了t0
:你是否能通过以下代码得到lui t0, 0x22223 addi t0, t0, 0x333
0xDEADBEEF
?如果你觉得不能的话,先解释为什么不能,再修改代码中的一个字符,使得以下代码有效地得到0xDEADBEEF
。lui t1, 0xDEADB addi t1, t1, -273 // 0xEEF
回答:
上面的代码会得到 DEADAEEF
, 主要原因是 addi
指令会将指令中的 12 位数据进行符号拓展后产生 32 位的立即数,所以由 -273
产生的立即数为 FFFFFEEF
, 所以与 lui
产生的 DEADB000
相加,结果为 DEADAEEF
只改变一个字符解决方法,可以改成:
lui t1, 0xDEADC
addi t1, t1, -273 // 0xEEF
这样,高 20 位的 DEADC
就会与 FFFFF
相加得到 DEADB
, 最后就能得到 DEADBEEF
PCPU
Lab 5-1
操作方法与实验步骤
代码设计层次结构图及说明
在流水线 CPU 中, 需要将单周期 CPU 拆成 \(5\) 个阶段, 分别是 IF, ID, EX, MEM, WB
阶段
同时需要将每个阶段中的重要信号通过阶段寄存器传到下一个阶段
-
IF
阶段主要实现通过PCSrc
取址, 从 ROM 取出对应的指令 -
ID
阶段主要实现SCPU_Ctrl
用于产生控制信号,RegFile
寄存器堆用于读取, 写入寄存器值, 还有ImmGen
用于产生指令对应的立即数
同时产生 StoreData
用于 store
指令存储值
-
EX
阶段主要实现ALU
用于计算 -
MEM
阶段主要实现输出存入 RAM 中的值, 以及从 RAM 中读入的值传入下一阶段
以及产生 PCSrc
用于分支跳转
* WB
阶段主要实现产生寄存器需要写入的值, 并送回 ID
阶段
源代码
- 整体的流水线 CPU 代码:
module Pipeline_CPU(
input clk,
input rst,
input [31:0] Data_in,
input [31:0] inst_IF,
output [31:0] Addr_out,
output [31:0] Data_out,
output [31:0] Data_out_WB,
output [31:0] PC_out_IF,
output [31:0] inst_ID,
output [31:0] PC_out_ID,
output [31:0] PC_out_EX,
output MemRW_EX,
output MemRW_Mem,
output [3:0] wea,
output [31:0] Reg00, output [31:0] Reg01,output [31:0] Reg02,output [31:0] Reg03,
output [31:0] Reg04, output [31:0] Reg05,output [31:0] Reg06,output [31:0] Reg07,
output [31:0] Reg08, output [31:0] Reg09,output [31:0] Reg10,output [31:0] Reg11,
output [31:0] Reg12, output [31:0] Reg13,output [31:0] Reg14,output [31:0] Reg15,
output [31:0] Reg16, output [31:0] Reg17,output [31:0] Reg18,output [31:0] Reg19,
output [31:0] Reg20, output [31:0] Reg21,output [31:0] Reg22,output [31:0] Reg23,
output [31:0] Reg24, output [31:0] Reg25,output [31:0] Reg26,output [31:0] Reg27,
output [31:0] Reg28, output [31:0] Reg29,output [31:0] Reg30,output [31:0] Reg31
);
wire [31:0] PC_out_EXMem, ALU_out_EXMem;
wire [4:0] Rd_addr_out_MemWB;
wire RegWrite_out_MemWB;
wire [2:0] PCSrc;
Pipeline_IF PPLIF (
.clk_IF(clk),
.rst_IF(rst),
.en_IF(1'b1),
.PC_in_IF(PC_out_EXMem),
.PCSrc(PCSrc),
.ALU_in_IF(ALU_out_EXMem),
.PC_out_IF(PC_out_IF)
);
wire [31:0] PC_out_IFID, inst_out_IFID;
IF_reg_ID PPLIFID (
.clk_IFID(clk),
.rst_IFID(rst),
.en_IFID(1'b1),
.PC_in_IFID(PC_out_IF),
.inst_in_IFID(inst_IF),
.PC_out_IFID(PC_out_IFID),
.inst_out_IFID(inst_out_IFID)
);
wire [31:0] Rs1_out_ID, Rs2_out_ID,
Imm_out_ID;
wire ALUSrc_B_ID, MemRW_ID, RegWrite_out_ID;
wire [4:0] Rd_addr_out_ID;
wire [3:0] ALU_control_ID;
wire [2:0] Branch_ID;
wire [1:0] Jump_ID;
wire [2:0] MemtoReg_ID;
wire [3:0] WHBU_ID;
wire [1:0] LSwea_ID;
wire [31:0] StoreData_ID;
wire [3:0] wea_ID;
Pipeline_ID PPLID (
.clk_ID(clk),
.rst_ID(rst),
.RegWrite_in_ID(RegWrite_out_MemWB),
.Rd_addr_ID(Rd_addr_out_MemWB),
.Wt_data_ID(Data_out_WB),
.inst_in_ID(inst_out_IFID),
.Rd_addr_out_ID(Rd_addr_out_ID),
.Rs1_out_ID(Rs1_out_ID),
.Rs2_out_ID(Rs2_out_ID),
.Imm_out_ID(Imm_out_ID),
.ALUSrc_B_ID(ALUSrc_B_ID),
.ALU_control_ID(ALU_control_ID),
.Branch_ID(Branch_ID),
.MemRW_ID(MemRW_ID),
.Jump_ID(Jump_ID),
.MemtoReg_ID(MemtoReg_ID),
.RegWrite_out_ID(RegWrite_out_ID),
.WHBU_ID(WHBU_ID),
.LSwea_ID(LSwea_ID),
.StoreData_ID(StoreData_ID),
.wea_ID(wea_ID),
.Reg00(Reg00),.Reg01(Reg01),.Reg02(Reg02),.Reg03(Reg03),
.Reg04(Reg04),.Reg05(Reg05),.Reg06(Reg06),.Reg07(Reg07),
.Reg08(Reg08),.Reg09(Reg09),.Reg10(Reg10),.Reg11(Reg11),
.Reg12(Reg12),.Reg13(Reg13),.Reg14(Reg14),.Reg15(Reg15),
.Reg16(Reg16),.Reg17(Reg17),.Reg18(Reg18),.Reg19(Reg19),
.Reg20(Reg20),.Reg21(Reg21),.Reg22(Reg22),.Reg23(Reg23),
.Reg24(Reg24),.Reg25(Reg25),.Reg26(Reg26),.Reg27(Reg27),
.Reg28(Reg28),.Reg29(Reg29),.Reg30(Reg30),.Reg31(Reg31)
);
wire [31:0] PC_out_IDEX, Rs1_out_IDEX, Rs2_out_IDEX, Imm_out_IDEX;
wire [4:0] Rd_addr_out_IDEX;
wire ALUSrc_B_out_IDEX, MemRW_out_IDEX, RegWrite_out_IDEX;
wire [3:0] ALU_control_out_IDEX;
wire [2:0] Branch_out_IDEX;
wire [1:0] Jump_out_IDEX;
wire [2:0] MemtoReg_out_IDEX;
wire [3:0] WHBU_out_IDEX;
wire [1:0] LSwea_out_IDEX;
wire [31:0] StoreData_out_IDEX;
wire [3:0] wea_out_IDEX;
ID_reg_Ex PPLIDEx (
.clk_IDEX(clk),
.rst_IDEX(rst),
.en_IDEX(1'b1),
.PC_in_IDEX(PC_out_IFID),
.Rd_addr_IDEX(Rd_addr_out_ID),
.Rs1_in_IDEX(Rs1_out_ID),
.Rs2_in_IDEX(Rs2_out_ID),
.Imm_in_IDEX(Imm_out_ID),
.ALUSrc_B_in_IDEX(ALUSrc_B_ID),
.ALU_control_in_IDEX(ALU_control_ID),
.Branch_in_IDEX(Branch_ID),
.Jump_in_IDEX(Jump_ID),
.MemRW_in_IDEX(MemRW_ID),
.MemtoReg_in_IDEX(MemtoReg_ID),
.RegWrite_in_IDEX(RegWrite_out_ID),
.WHBU_in_IDEX(WHBU_ID),
.LSwea_in_IDEX(LSwea_ID),
.StoreData_in_IDEX(StoreData_ID),
.wea_in_IDEX(wea_ID),
.PC_out_IDEX(PC_out_IDEX),
.Rd_addr_out_IDEX(Rd_addr_out_IDEX),
.Rs1_out_IDEX(Rs1_out_IDEX),
.Rs2_out_IDEX(Rs2_out_IDEX),
.Imm_out_IDEX(Imm_out_IDEX),
.ALUSrc_B_out_IDEX(ALUSrc_B_out_IDEX),
.ALU_control_out_IDEX(ALU_control_out_IDEX),
.Branch_out_IDEX(Branch_out_IDEX),
.Jump_out_IDEX(Jump_out_IDEX),
.MemRW_out_IDEX(MemRW_out_IDEX),
.MemtoReg_out_IDEX(MemtoReg_out_IDEX),
.RegWrite_out_IDEX(RegWrite_out_IDEX),
.WHBU_out_IDEX(WHBU_out_IDEX),
.LSwea_out_IDEX(LSwea_out_IDEX),
.StoreData_out_IDEX(StoreData_out_IDEX),
.wea_out_IDEX(wea_out_IDEX)
);
wire [31:0] PC4_out_EX, zero_out_EX, ALU_out_EX, Rs2_out_EX;
Pipeline_Ex PPLEx (
.PC_in_EX(PC_out_IDEX),
.Rs1_in_EX(Rs1_out_IDEX),
.Rs2_in_EX(Rs2_out_IDEX),
.Imm_in_EX(Imm_out_IDEX),
.ALUSrc_B_in_EX(ALUSrc_B_out_IDEX),
.ALU_control_in_EX(ALU_control_out_IDEX),
.PC_out_EX(PC_out_EX),
.PC4_out_EX(PC4_out_EX),
.zero_out_EX(zero_out_EX),
.ALU_out_EX(ALU_out_EX),
.Rs2_out_EX(Rs2_out_EX)
);
wire [31:0] PC4_out_EXMem, Rs2_out_EXMem, Imm_out_EXMem;
wire [4:0] Rd_addr_out_EXMem;
wire zero_out_EXMem, MemRW_out_EXMem, RegWrite_out_EXMem;
wire [2:0] Branch_out_EXMem;
wire [1:0] Jump_out_EXMem;
wire [2:0] MemtoReg_out_EXMem;
wire [3:0] WHBU_out_EXMem;
wire [1:0] LSwea_out_EXMem;
wire [31:0] StoreData_out_EXMem;
wire [3:0] wea_out_EXMem;
Ex_reg_Mem PPLExMem (
.clk_EXMem(clk),
.rst_EXMem(rst),
.en_EXMem(1'b1),
.PC_in_EXMem(PC_out_EX),
.Imm_in_EXMem(Imm_out_IDEX),
.PC4_in_EXMem(PC4_out_EX),
.Rd_addr_EXMem(Rd_addr_out_IDEX),
.zero_in_EXMem(zero_out_EX),
.ALU_in_EXMem(ALU_out_EX),
.Rs2_in_EXMem(Rs2_out_EX),
.Branch_in_EXMem(Branch_out_IDEX),
.Jump_in_EXMem(Jump_out_IDEX),
.MemRW_in_EXMem(MemRW_out_IDEX),
.MemtoReg_in_EXMem(MemtoReg_out_IDEX),
.RegWrite_in_EXMem(RegWrite_out_IDEX),
.WHBU_in_EXMem(WHBU_out_IDEX),
.LSwea_in_EXMem(LSwea_out_IDEX),
.StoreData_in_EXMem(StoreData_out_IDEX),
.wea_in_EXMem(wea_out_IDEX),
.PC_out_EXMem(PC_out_EXMem),
.Imm_out_EXMem(Imm_out_EXMem),
.PC4_out_EXMem(PC4_out_EXMem),
.Rd_addr_out_EXMem(Rd_addr_out_EXMem),
.zero_out_EXMem(zero_out_EXMem),
.ALU_out_EXMem(ALU_out_EXMem),
.Rs2_out_EXMem(Rs2_out_EXMem),
.Branch_out_EXMem(Branch_out_EXMem),
.Jump_out_EXMem(Jump_out_EXMem),
.MemRW_out_EXMem(MemRW_out_EXMem),
.MemtoReg_out_EXMem(MemtoReg_out_EXMem),
.RegWrite_out_EXMem(RegWrite_out_EXMem),
.WHBU_out_EXMem(WHBU_out_EXMem),
.LSwea_out_EXMem(LSwea_out_EXMem),
.StoreData_out_EXMem(StoreData_out_EXMem),
.wea_out_EXMem(wea_out_EXMem)
);
Pipeline_Mem PPLMem (
.zero_in_Mem(zero_out_EXMem),
.res_in_Mem(ALU_out_EXMem),
.Branch_in_Mem(Branch_out_EXMem),
.Jump_in_Mem(Jump_out_EXMem),
.PCSrc(PCSrc)
);
wire [31:0] PC_out_MemWB, Imm_out_MemWB, PC4_out_MemWB,
ALU_out_MemWB, Dmem_data_out_MemWB;
wire [2:0] MemtoReg_out_MemWB;
//wire RegWrite_out_MemWB;
wire [3:0] WHBU_out_MemWB;
wire [1:0] LSwea_out_MemWB;
Mem_reg_WB PPLMemWB (
.clk_MemWB(clk),
.rst_MemWB(rst),
.en_MemWB(1'b1),
.PC_in_MemWB(PC_out_EXMem),
.Imm_in_MemWB(Imm_out_EXMem),
.PC4_in_MemWB(PC4_out_EXMem),
.Rd_addr_MemWB(Rd_addr_out_EXMem),
.ALU_in_MemWB(ALU_out_EXMem),
.Dmem_data_MemWB(Data_in),
.MemtoReg_in_MemWB(MemtoReg_out_EXMem),
.RegWrite_in_MemWB(RegWrite_out_EXMem),
.WHBU_in_MemWB(WHBU_out_EXMem),
.LSwea_in_MemWB(LSwea_out_EXMem),
.PC_out_MemWB(PC_out_MemWB),
.Imm_out_MemWB(Imm_out_MemWB),
.PC4_out_MemWB(PC4_out_MemWB),
.Rd_addr_out_MemWB(Rd_addr_out_MemWB),
.ALU_out_MemWB(ALU_out_MemWB),
.Dmem_data_out_MemWB(Dmem_data_out_MemWB),
.MemtoReg_out_MemWB(MemtoReg_out_MemWB),
.RegWrite_out_MemWB(RegWrite_out_MemWB),
.WHBU_out_MemWB(WHBU_out_MemWB),
.LSwea_out_MemWB(LSwea_out_MemWB)
);
Pipeline_WB PPLWB (
.PC4_in_WB(PC4_out_MemWB),
.ALU_in_WB(ALU_out_MemWB),
.Dmem_data_WB(Dmem_data_out_MemWB),
.Imm_in_WB(Imm_out_MemWB),
.PC_in_WB(PC_out_MemWB),
.MemtoReg_in_WB(MemtoReg_out_MemWB),
.WHBU_in_WB(WHBU_out_MemWB),
.LSwea_in_WB(LSwea_out_MemWB),
.Data_out_WB(Data_out_WB)
);
assign MemRW_EX = MemRW_out_IDEX;
assign MemRW_Mem = MemRW_out_EXMem;
assign Addr_out = ALU_out_EXMem;
assign Data_out = StoreData_out_EXMem;
assign wea = wea_out_EXMem;
assign PC_out_ID = PC_out_IFID;
assign inst_ID = inst_out_IFID;
endmodule
实例化了 \(9\) 个模块, \(5\) 个阶段 + \(4\) 个寄存器
IF
阶段代码:
module Pipeline_IF(
input clk_IF, //时钟
input rst_IF, //复位
input en_IF, //使能
input [31:0] PC_in_IF, //取指令PC输入, = PCAddImm
input [2:0] PCSrc, //PC输入选择
input [31:0] ALU_in_IF, //ALU输出
output wire [31:0] PC_out_IF //PC输出
);
reg [31:0] PC_in;
wire [31:0] PCAdd4;
assign PCAdd4 = PC_out_IF + 32'd4;
always @ (*) begin
case (PCSrc)
3'b000: PC_in = PCAdd4;
3'b100: PC_in = PC_in_IF; // branch
3'b001: PC_in = PC_in_IF; // jal
3'b010: PC_in = ALU_in_IF; // jalr
default: PC_in = 32'b0;
endcase
end
Reg PC(.clk(clk_IF), .rst(rst_IF), .CE(en_IF), .D(PC_in), .Q(PC_out_IF));
endmodule
输出写回数据
ID
阶段代码:
module Pipeline_ID(
input clk_ID, //时钟
input rst_ID, //复位
input RegWrite_in_ID, //寄存器堆使能
input [4:0] Rd_addr_ID, //写目的地址输入
input [31:0] Wt_data_ID, //写数据输出
input [31:0] inst_in_ID, //指令输入
output [4:0] Rd_addr_out_ID, //写目的地址输出
output [31:0] Rs1_out_ID , //操作数1输出
output [31:0] Rs2_out_ID , //操作数2输出
output [31:0] Imm_out_ID , //立即数输出
output ALUSrc_B_ID , //ALU B端输入选择
output [3:0] ALU_control_ID, //ALU控制
output [2:0] Branch_ID, //Beq控制
output MemRW_ID, //存储器读
output [1:0] Jump_ID, //Jal控制
output [2:0] MemtoReg_ID, //寄存器写回
output RegWrite_out_ID, //寄存器堆读写
output [3:0] WHBU_ID,
output [1:0] LSwea_ID,
output reg [31:0] StoreData_ID,
output reg [3:0] wea_ID,
output [31:0] Reg00, output [31:0] Reg01,output [31:0] Reg02,output [31:0] Reg03,
output [31:0] Reg04, output [31:0] Reg05,output [31:0] Reg06,output [31:0] Reg07,
output [31:0] Reg08, output [31:0] Reg09,output [31:0] Reg10,output [31:0] Reg11,
output [31:0] Reg12, output [31:0] Reg13,output [31:0] Reg14,output [31:0] Reg15,
output [31:0] Reg16, output [31:0] Reg17,output [31:0] Reg18,output [31:0] Reg19,
output [31:0] Reg20, output [31:0] Reg21,output [31:0] Reg22,output [31:0] Reg23,
output [31:0] Reg24, output [31:0] Reg25,output [31:0] Reg26,output [31:0] Reg27,
output [31:0] Reg28, output [31:0] Reg29,output [31:0] Reg30,output [31:0] Reg31
);
wire [4:0] RS1, RS2, WT;
wire [4:0] OP;
wire [2:0] FUN3;
wire FUN7;
assign RS1 = inst_in_ID[19:15];
assign RS2 = inst_in_ID[24:20];
assign WT = inst_in_ID[11:7];
assign OP = inst_in_ID[6:2];
assign FUN3 = inst_in_ID[14:12];
assign FUN7 = inst_in_ID[30];
assign LSwea_ID = Imm_out_ID[1:0];
wire [2:0] ImmSel;
ImmGen immgen(.ImmSel(ImmSel), .inst_field(inst_in_ID), .Imm_out(Imm_out_ID));
SCPU_ctrls Ctrl(.OPcode(OP), .Fun3(FUN3), .Fun7(FUN7),
.ImmSel(ImmSel), .ALUSrc_B(ALUSrc_B_ID), .MemtoReg(MemtoReg_ID),
.Jump(Jump_ID), .Branch(Branch_ID), .RegWrite(RegWrite_out_ID), .MemRW(MemRW_ID),
.ALU_Control(ALU_control_ID), .WHBU(WHBU_ID)
);
Regs regs(.clk(clk_ID), .rst(rst_ID), .Rs1_addr(RS1), .Rs2_addr(RS2),
.Wt_addr(Rd_addr_ID), .Wt_data(Wt_data_ID), .RegWrite(RegWrite_in_ID),
.Rs1_data(Rs1_out_ID), .Rs2_data(Rs2_out_ID),
.Reg00(Reg00),.Reg01(Reg01),.Reg02(Reg02),.Reg03(Reg03),
.Reg04(Reg04),.Reg05(Reg05),.Reg06(Reg06),.Reg07(Reg07),
.Reg08(Reg08),.Reg09(Reg09),.Reg10(Reg10),.Reg11(Reg11),
.Reg12(Reg12),.Reg13(Reg13),.Reg14(Reg14),.Reg15(Reg15),
.Reg16(Reg16),.Reg17(Reg17),.Reg18(Reg18),.Reg19(Reg19),
.Reg20(Reg20),.Reg21(Reg21),.Reg22(Reg22),.Reg23(Reg23),
.Reg24(Reg24),.Reg25(Reg25),.Reg26(Reg26),.Reg27(Reg27),
.Reg28(Reg28),.Reg29(Reg29),.Reg30(Reg30),.Reg31(Reg31)
);
assign Rd_addr_out_ID = inst_in_ID[11:7];
always @ (*) begin
case (LSwea_ID)
2'b00: begin
StoreData_ID = Rs2_out_ID;
case (WHBU_ID)
4'b0010: wea_ID = 4'b0001;
4'b0100: wea_ID = 4'b0011;
4'b1000: wea_ID = 4'b1111;
default: wea_ID = 0;
endcase
end
2'b01: begin
StoreData_ID = {Rs2_out_ID[23:0], 8'b0};
case (WHBU_ID)
4'b0010: wea_ID = 4'b0010;
4'b0100: wea_ID = 4'b0110;
default: wea_ID = 0;
endcase
end
2'b10: begin
StoreData_ID = {Rs2_out_ID[15:0], 16'b0};
case (WHBU_ID)
4'b0010: wea_ID = 4'b0100;
4'b0100: wea_ID = 4'b1100;
default: wea_ID = 0;
endcase
end
2'b11: begin
StoreData_ID = {Rs2_out_ID[23:0], 24'b0};
case (WHBU_ID)
4'b0010: wea_ID = 4'b1000;
default: wea_ID = 0;
endcase
end
endcase
end
endmodule
实例化了寄存器, 立即数生成器和 Ctrl Unit
EX
阶段代码:
module Pipeline_Ex(
input[31:0] PC_in_EX, //PC输入
input[31:0] Rs1_in_EX, //操作1输入
input[31:0] Rs2_in_EX, //操作2输入
input[31:0] Imm_in_EX, //立即数
input ALUSrc_B_in_EX, //ALU B选择
input [3:0] ALU_control_in_EX, //ALU选择控制
output [31:0] PC_out_EX, //PC输出, PCAddImm
output [31:0] PC4_out_EX, //PC+4输出
output zero_out_EX, //ALU0输出
output [31:0] ALU_out_EX, //ALU计算输出
output [31:0] Rs2_out_EX //操作2输出
);
wire [31:0] ALUB;
assign PC_out_EX = PC_in_EX + Imm_in_EX; // PC_out = PC + Imm
assign PC4_out_EX = PC_in_EX + 32'd4;
assign Rs2_out_EX = Rs2_in_EX;
assign ALUB = (ALUSrc_B_in_EX) ? Imm_in_EX : Rs2_in_EX;
ALU alu(.A(Rs1_in_EX), .B(ALUB), .ALU_operation(ALU_control_in_EX), .res(ALU_out_EX),
.zero(zero_out_EX));
endmodule
实例化了 ALU
MEM
阶段代码:
module Pipeline_Mem(
input zero_in_Mem, //zero
input [31:0] res_in_Mem, // ALU res
input [2:0] Branch_in_Mem, //beq
input [1:0] Jump_in_Mem, //jal
output [2:0] PCSrc //PC选择控制输出
);
wire Branch_one;
assign Branch_one = (Branch_in_Mem == 3'b000) ? 1'b0 : 1'b1;
reg branch;
always @ (*) begin
case (Branch_in_Mem)
3'b001: branch = Branch_one & (zero_in_Mem); // BEQ
3'b010: branch = Branch_one & (~zero_in_Mem); // BNE
3'b011: branch = Branch_one & (res_in_Mem[0]); // BLT
3'b100: branch = Branch_one & (~res_in_Mem[0]); // BGE
3'b101: branch = Branch_one & (res_in_Mem[0]); // BLTU
3'b110: branch = Branch_one & (~res_in_Mem[0]); // BGEU
default: branch = 1'b0;
endcase
end
assign PCSrc = {branch, Jump_in_Mem};
endmodule
产生 PCSrc
并与 RAM 交互
WB
阶段代码:
module Pipeline_WB(
input [31:0] PC4_in_WB, //PC+4输入
input [31:0] ALU_in_WB, //ALU结果输出
input [31:0] Dmem_data_WB, //存储器数据输出
input [31:0] Imm_in_WB, //立即数输出
input [31:0] PC_in_WB, //PC+立即数输出
input [3:0] WHBU_in_WB,
input [1:0] LSwea_in_WB,
input [2:0] MemtoReg_in_WB, //写回选择控制
output [31:0] Data_out_WB //写回数据输出
);
reg [31:0] reg_wt_data;
reg [31:0] LoadData;
wire [31:0] Data_in;
assign Data_in = Dmem_data_WB;
assign Data_out_WB = reg_wt_data;
always @ (*) begin
case (MemtoReg_in_WB)
3'd0: reg_wt_data = ALU_in_WB; // ALU
3'd1: begin
case (WHBU_in_WB)
4'b0010: reg_wt_data = {{24{LoadData[7]}}, LoadData[7:0]}; // LB
4'b0100: reg_wt_data = {{16{LoadData[15]}}, LoadData[15:0]}; // LH
4'b1000: reg_wt_data = LoadData; // LW
4'b0011: reg_wt_data = {24'b0, LoadData[7:0]}; // LBU
4'b0101: reg_wt_data = {16'b0, LoadData[15:0]}; // LHU
default: reg_wt_data = 0;
endcase
end
3'd2: reg_wt_data = PC4_in_WB; // JAL
3'd3: reg_wt_data = Imm_in_WB; // lui
3'd4: reg_wt_data = PC_in_WB; // auipc
default: reg_wt_data = 32'b0;
endcase
end
always @ (*) begin
case (LSwea_in_WB)
2'b00: LoadData = Data_in;
2'b01: LoadData = {8'b0, Data_in[31:8]};
2'b10: LoadData = {16'b0, Data_in[31:16]};
2'b11: LoadData = {24'b0, Data_in[31:24]};
endcase
end
endmodule
产生写回数据
IFID
寄存器代码:
module IF_reg_ID(
input clk_IFID, //寄存器时钟
input rst_IFID, //寄存器复位
input en_IFID, //寄存器使能
input [31:0] PC_in_IFID, //PC输入
input [31:0] inst_in_IFID, //指令输入
output [31:0] PC_out_IFID, //PC输出
output [31:0] inst_out_IFID //指令输出
);
Reg PC(.clk(clk_IFID), .rst(rst_IFID), .CE(en_IFID), .D(PC_in_IFID), .Q(PC_out_IFID));
Reg inst(.clk(clk_IFID), .rst(rst_IFID), .CE(en_IFID), .D(inst_in_IFID), .Q(inst_out_IFID));
endmodule
传输指令
IDEX
寄存器代码:
module ID_reg_Ex(
input clk_IDEX, //寄存器时
input rst_IDEX, //寄存器复
input en_IDEX, //寄存器使
input [31:0] PC_in_IDEX, //PC输入
input [4:0] Rd_addr_IDEX, //写目的输入
input [31:0] Rs1_in_IDEX, //操作1输入
input [31:0] Rs2_in_IDEX, //操作2输入
input [31:0] Imm_in_IDEX , //立即数输出
input ALUSrc_B_in_IDEX , //ALU B输入选择
input [3:0] ALU_control_in_IDEX, //ALU选择控制
input [2:0] Branch_in_IDEX, //Beq
input MemRW_in_IDEX, //存储器读
input [1:0] Jump_in_IDEX, //Jal
input [2:0] MemtoReg_in_IDEX, //写回选择
input RegWrite_in_IDEX, //寄存器堆读写
input [3:0] WHBU_in_IDEX,
input [1:0] LSwea_in_IDEX,
input [31:0] StoreData_in_IDEX,
input [3:0] wea_in_IDEX,
output [31:0] PC_out_IDEX, //PC输出
output [4:0] Rd_addr_out_IDEX, //目的地址输出
output [31:0] Rs1_out_IDEX, //操作1输出
output [31:0] Rs2_out_IDEX, //操作2输出
output [31:0] Imm_out_IDEX , //立即数
output ALUSrc_B_out_IDEX , //ALU B选择
output [3:0] ALU_control_out_IDEX, //ALU控制
output [2:0] Branch_out_IDEX, //Beq
output MemRW_out_IDEX, //存储器
output [1:0] Jump_out_IDEX, //Jal
output [2:0] MemtoReg_out_IDEX, //写回
output RegWrite_out_IDEX, //寄存器堆读写
output [3:0] WHBU_out_IDEX,
output [1:0] LSwea_out_IDEX,
output [31:0] StoreData_out_IDEX,
output [3:0] wea_out_IDEX
);
Reg PC(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(PC_in_IDEX), .Q(PC_out_IDEX));
Reg Rd_addr(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(Rd_addr_IDEX), .Q(Rd_addr_out_IDEX));
Reg Rs1(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(Rs1_in_IDEX), .Q(Rs1_out_IDEX));
Reg Rs2(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(Rs2_in_IDEX), .Q(Rs2_out_IDEX));
Reg Imm(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(Imm_in_IDEX), .Q(Imm_out_IDEX));
Reg ALUSrc_B(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(ALUSrc_B_in_IDEX), .Q(ALUSrc_B_out_IDEX));
Reg ALU_control(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(ALU_control_in_IDEX), .Q(ALU_control_out_IDEX));
Reg Branch(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(Branch_in_IDEX), .Q(Branch_out_IDEX));
Reg MemRW(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(MemRW_in_IDEX), .Q(MemRW_out_IDEX));
Reg Jump(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(Jump_in_IDEX), .Q(Jump_out_IDEX));
Reg MemtoReg(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(MemtoReg_in_IDEX), .Q(MemtoReg_out_IDEX));
Reg RegWrite(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(RegWrite_in_IDEX), .Q(RegWrite_out_IDEX));
Reg WHBU(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(WHBU_in_IDEX), .Q(WHBU_out_IDEX));
Reg LSwea(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(LSwea_in_IDEX), .Q(LSwea_out_IDEX));
Reg StoreData(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(StoreData_in_IDEX), .Q(StoreData_out_IDEX));
Reg wea(.clk(clk_IDEX), .rst(rst_IDEX), .CE(en_IDEX), .D(wea_in_IDEX), .Q(wea_out_IDEX));
endmodule
传输控制信号
EXMEM
寄存器代码:
module Ex_reg_Mem(
input clk_EXMem, //寄存器时
input rst_EXMem, //寄存器复
input en_EXMem, //寄存器使
input [31:0] PC_in_EXMem, //PC输入
input [31:0] Imm_in_EXMem,
input [31:0] PC4_in_EXMem, //PC+4输入
input [4:0] Rd_addr_EXMem, //写目的寄存器地址输入
input zero_in_EXMem, //zero
input [31:0] ALU_in_EXMem, //ALU输入
input [31:0] Rs2_in_EXMem, //操作2输入
input [2:0] Branch_in_EXMem, //Beq
input MemRW_in_EXMem, //存储器读
input [1:0] Jump_in_EXMem, //Jal
input [2:0] MemtoReg_in_EXMem, //写回
input RegWrite_in_EXMem, //寄存器堆读写
input [3:0] WHBU_in_EXMem,
input [1:0] LSwea_in_EXMem,
input [31:0] StoreData_in_EXMem,
input [3:0] wea_in_EXMem,
output [31:0] PC_out_EXMem, //PC输出
output [31:0] Imm_out_EXMem, //立即数输
output [31:0] PC4_out_EXMem, //PC+4输出
output [4:0] Rd_addr_out_EXMem, //写目的寄存器输出
output zero_out_EXMem, //zero
output [31:0] ALU_out_EXMem, //ALU输出
output [31:0] Rs2_out_EXMem, //操作2输出
output [2:0] Branch_out_EXMem, //Beq
output MemRW_out_EXMem, //存储器读
output [1:0] Jump_out_EXMem, //Jal
output [2:0] MemtoReg_out_EXMem, //写回
output RegWrite_out_EXMem, //寄存器堆读写
output [3:0] WHBU_out_EXMem,
output [1:0] LSwea_out_EXMem,
output [31:0] StoreData_out_EXMem,
output [3:0] wea_out_EXMem
);
Reg PC(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(PC_in_EXMem), .Q(PC_out_EXMem));
Reg Imm(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(Imm_in_EXMem), .Q(Imm_out_EXMem));
Reg PC4(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(PC4_in_EXMem), .Q(PC4_out_EXMem));
Reg Rd_addr(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(Rd_addr_EXMem), .Q(Rd_addr_out_EXMem));
Reg zero(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(zero_in_EXMem), .Q(zero_out_EXMem));
Reg ALU(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(ALU_in_EXMem), .Q(ALU_out_EXMem));
Reg Rs2(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(Rs2_in_EXMem), .Q(Rs2_out_EXMem));
Reg Branch(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(Branch_in_EXMem), .Q(Branch_out_EXMem));
Reg MemRW(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(MemRW_in_EXMem), .Q(MemRW_out_EXMem));
Reg Jump(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(Jump_in_EXMem), .Q(Jump_out_EXMem));
Reg MemtoReg(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(MemtoReg_in_EXMem), .Q(MemtoReg_out_EXMem));
Reg RegWrite(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(RegWrite_in_EXMem), .Q(RegWrite_out_EXMem));
Reg WHBU(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(WHBU_in_EXMem), .Q(WHBU_out_EXMem));
Reg LSwea(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(LSwea_in_EXMem), .Q(LSwea_out_EXMem));
Reg StoreData(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(StoreData_in_EXMem), .Q(StoreData_out_EXMem));
Reg wea(.clk(clk_EXMem), .rst(rst_EXMem), .CE(en_EXMem), .D(wea_in_EXMem), .Q(wea_out_EXMem));
endmodule
传输计算值
MEMWB
寄存器代码:
module Mem_reg_WB(
input clk_MemWB, //寄存器时
input rst_MemWB, //寄存器复
input en_MemWB, //寄存器使
input [31:0] PC_in_MemWB,
input [31:0] Imm_in_MemWB,
input [31:0] PC4_in_MemWB, //PC+4输入
input [4:0] Rd_addr_MemWB, //写目的输入
input [31:0] ALU_in_MemWB, //ALU输入
input [31:0] Dmem_data_MemWB, //存储器数据
input [2:0] MemtoReg_in_MemWB, //写回
input RegWrite_in_MemWB, //寄存器堆读写
input [3:0] WHBU_in_MemWB,
input [1:0] LSwea_in_MemWB,
output [31:0] PC_out_MemWB, //PC输出
output [31:0] Imm_out_MemWB,
output [31:0] PC4_out_MemWB, //PC+4输出
output [4:0] Rd_addr_out_MemWB, //写目的输出
output [31:0] ALU_out_MemWB, //ALU输出
output [31:0] Dmem_data_out_MemWB, //存储器数据
output [2:0] MemtoReg_out_MemWB, //写回
output RegWrite_out_MemWB, //寄存器堆读写
output [3:0] WHBU_out_MemWB,
output [1:0] LSwea_out_MemWB
);
Reg PC(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(PC_in_MemWB), .Q(PC_out_MemWB));
Reg Imm(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(Imm_in_MemWB), .Q(Imm_out_MemWB));
Reg PC4(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(PC4_in_MemWB), .Q(PC4_out_MemWB));
Reg Rd_addr(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(Rd_addr_MemWB), .Q(Rd_addr_out_MemWB));
Reg ALU(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(ALU_in_MemWB), .Q(ALU_out_MemWB));
Reg Dmem_data(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(Dmem_data_MemWB), .Q(Dmem_data_out_MemWB));
Reg MemtoReg(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(MemtoReg_in_MemWB), .Q(MemtoReg_out_MemWB));
Reg RegWrite(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(RegWrite_in_MemWB), .Q(RegWrite_out_MemWB));
Reg WHBU(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(WHBU_in_MemWB), .Q(WHBU_out_MemWB));
Reg LSwea(.clk(clk_MemWB), .rst(rst_MemWB), .CE(en_MemWB), .D(LSwea_in_MemWB), .Q(LSwea_out_MemWB));
endmodule
仿真关键步骤说明
- 仿真汇编:
仿真代码为将 Lab4-3 仿真代码每句指令中间插入 \(3\) 个 nop
指令得来
最终运行结果为 x31
寄存器值 666
- testbench 模块:
module testbench(
input wire clk,
input wire rst
);
wire [31:0] Addr_out;
wire [31:0] Data_out;
wire CPU_MIO;
wire MemRW_Mem;
wire [3:0] memrw;
wire [3:0] MEMRW;
wire [31:0] PC_out_IF;
wire [31:0] douta;
wire [31:0] spo;
wire [3:0] wea;
assign memrw = {MemRW_Mem, MemRW_Mem, MemRW_Mem, MemRW_Mem};
assign MEMRW = memrw & wea;
Pipeline_CPU u0(
.clk(clk),
.rst(rst),
.Data_in(douta),
.inst_IF(spo),
.Addr_out(Addr_out),
.Data_out(Data_out),
.PC_out_IF(PC_out_IF),
.MemRW_Mem(MemRW_Mem),
.wea(wea)
);
RAM_B u1(
.clka(~clk),
.wea(MEMRW),
.addra(Addr_out[11:2]),
.dina(Data_out),
.douta(douta)
);
ROM_D u2(
.a(PC_out_IF[11:2]),
.spo(spo)
);
endmodule
实例化了流水线 CPU, RAM, ROM 模块, 并进行了连线
- 仿真代码:
module sim();
reg clk;
reg rst;
testbench m0(.clk(clk), .rst(rst));
initial begin
clk = 1'b0;
rst = 1'b1;
#50;
rst = 1'b0;
end
always #10 clk = ~clk;
endmodule
在仿真顶层代码中, 实例化一个 testbench 模块并将时钟,复位信号传入模块
实验结果与分析
仿真结果
-
初始时所有寄存器值为
0
然后流水线 CPU 正常运行, 与 Lab 4-3 中的中间运行结果保持一致
- 最后寄存器
x31 = 666
, 说明仿真成功
下板结果
Lab 5-2
操作方法与实验步骤
代码设计及说明
Stall
在流水线 CPU 产生冲突时, 可以通过暂停 IF, ID
阶段, 向 EX
阶段插入 nop
指令来实现 stall.
对于 Data Hazard, 如果 当 EX, MEM, WB
阶段的 rd
与 ID
阶段的 rs
相同时, 说明产生冲突, 就需要 stall 一个周期
对于 Control Hazard, 如果 EX, MEM
阶段的 Branch, Jump
信号为 \(1\), 那么说明是跳转指令, 就需要 stall 一个周期
Structural Hazard
在寄存器读写时, 如果在上升沿进行写操作, 那么在写入的那一周期无法同时将写入的值读出, 导致需要多 stall 一个周期
因此, 可以改为在时钟下降沿写, 于是可以提前半个周期把值写入, 就可以在下一个整周期读出写入的值, 因此可以只 stall 两个周期解决 data hazard
Data Hazard
在处理数据冲突时, 我采用了 Forwarding 的处理方式
当 MEM, WB
阶段的 rd
与 ID
阶段的 rs
相同时, 说明产生冲突, 那么就可以将 rd
的值旁路传回 EX
阶段, 并控制 ALU
直接选择 rs
为 Forwarding 后的值
同时, 为了处理 Load-Use 情况, 需要 stall 控制模块在 EX
阶段进行一次 stall, 之后再进行 Forwarding 即可
同时, 处理 Use-Store 情况, 需要将 StoreData
的产生从 ID
阶段挪到 EX
阶段, 然后选择 Forwarding 后的数据作为 StoreData
即可
最后是 Lui-Use 情况, 由于在我实现的 lui
指令, 产生的立即数 ImmOut
不经过 ALU, 在传到 WB
阶段时才能够写回
所以为了将 WB
阶段的 lui
的立即数旁路传回 EX
阶段, 需要将 lui
指令在 EX
阶段多 stall 一个周期, 这样就可以将 WB
阶段的立即数值传回 EX
阶段, 完成Forwarding
Forwarding
控制产生 Forwarding 信号
如果 MEM
阶段的 rd
与 ID
阶段的 rs
相同时, 那么控制产生 Forward = 01
如果 WB
阶段的 rd
与 ID
阶段的 rs
相同时, 那么控制产生 Forward = 10
Control Hazard
在处理控制冲突时, 我只采用了 stall 的方式, 不过写完才发现是和 Branch Always Not Taken 结合后的奇妙版本
当跳转指令处于 EX
阶段时, 直接 stall 一个周期; 当指令处于 MEM
阶段时, 由于 PCSrc
也在此阶段产生, 所以我判断了 PCSrc
的值来决定是否 flush 掉 IF, ID
阶段寄存器的值
如果 taken, 那么 flush 掉 IF, ID
阶段, 相当于 stall 了 \(3\) 个周期; 如果 not taken, 则流水线继续运行, 相当于只在 EX
阶段 stall 了 \(1\) 个周期
源代码
RegFile
模块:
always @(negedge clk /*double bump*/ or posedge rst) begin
if (rst) begin
for (i = 0; i < 32; i = i + 1) begin
Reg[i] <= 32'b0;
end
end
else begin
if (RegWrite && (Wt_addr != 5'b0)) begin
Reg[Wt_addr] <= Wt_data;
end
else begin
Reg[Wt_addr] <= Reg[Wt_addr];
end
end
end
assign Rs1_data = Reg[Rs1_addr];
assign Rs2_data = Reg[Rs2_addr];
将寄存器改为下降沿写
Stall
模块:
module stall_control (
input wire [4:0] ID_rs1, // 当前指令的源操作数 rs1
input wire [4:0] ID_rs2, // 当前指令的源操作数 rs2
input wire [4:0] IDEX_rd, // ID 阶段的目标寄存器 // rd_out
input wire [4:0] EXMem_rd, // EX 阶段的目标寄存器 // rd_out
input wire [4:0] MemWB_rd, // MEM 阶段的目标寄存器 // rd_out
input wire MemWB_RegWrite, //RegWrite_out
input wire EXMem_RegWrite, //RegWrite_out
input wire IDEX_RegWrite, //RegWrite_out
input wire [2:0] IDEX_MemtoReg, //MemtoReg_out
input wire [2:0] Mem_PCSrc,
input wire [2:0] IDEX_Branch,
input wire [2:0] EXMem_Branch,
input wire [1:0] IDEX_Jump,
input wire [1:0] EXMem_Jump,
output reg PC_en, // PC 写使能信号
output reg IFID_en, // IF/ID 写使能信号
output reg IDEX_Bubble, // ID/EX 阶段插入 bubble
output reg IFID_flush,
output reg IDEX_flush
);
wire stall, DataHazard, ControlHazard;
// with forwarding
assign DataHazard = (
(IDEX_RegWrite == 1'b1 && IDEX_MemtoReg == 3'b1 && (IDEX_rd != 5'b0) &&
((ID_rs1 == IDEX_rd) || (ID_rs2 == IDEX_rd))) || // Load-Use Hazard
(IDEX_RegWrite == 1'b1 && IDEX_MemtoReg == 3'b11 && (IDEX_rd != 5'b0) &&
((ID_rs1 == IDEX_rd) || (ID_rs2 == IDEX_rd))) // Lui-Use Hazard
);
// without forwarding
// assign DataHazard = (
// (IDEX_RegWrite && (IDEX_rd != 5'b0) &&
// ((ID_rs1 == IDEX_rd) || (ID_rs2 == IDEX_rd))) ||
// (EXMem_RegWrite && (EXMem_rd != 5'b0) &&
// ((ID_rs1 == EXMem_rd) || (ID_rs2 == EXMem_rd)))
// // ||
// // (MemWB_RegWrite && (MemWB_rd != 5'b0) &&
// // ((ID_rs1 == MemWB_rd) || (ID_rs2 == MemWB_rd)))
// );
assign ControlHazard = ((IDEX_Branch != 3'b0) || (EXMem_Branch != 3'b0) ||
(IDEX_Jump != 2'b0) || (EXMem_Jump != 2'b0));
assign stall = (DataHazard || ControlHazard);
always @(*) begin
if (stall) begin
if (EXMem_Branch != 3'b0 || EXMem_Jump != 2'b0) begin
PC_en = 1;
IFID_en = 1;
IDEX_Bubble = 0;
if(Mem_PCSrc != 3'b0) begin
IFID_flush = 1;
IDEX_flush = 1;
end
else begin
IFID_flush = 0;
IDEX_flush = 0;
end
end
else begin
PC_en = 0;
IFID_en = 0;
IFID_flush = 0;
IDEX_flush = 0;
IDEX_Bubble = 1;
end
end
else begin
PC_en = 1;
IFID_en = 1;
IFID_flush = 0;
IDEX_flush = 0;
IDEX_Bubble = 0;
end
end
endmodule
Forwarding
模块:module ForwardingUnit( input [4:0] EXMem_Rd, input [4:0] MemWB_Rd, input [4:0] IDEX_Rs1, input [4:0] IDEX_Rs2, input EXMem_RegWrite, input MemWB_RegWrite, output reg [1:0] ForwardA, output reg [1:0] ForwardB ); wire EXMemForwardA, EXMemForwardB, MemWBForwardA, MemWBForwardB; assign EXMemForwardA = EXMem_RegWrite && (EXMem_Rd != 0) && (EXMem_Rd == IDEX_Rs1); assign EXMemForwardB = EXMem_RegWrite && (EXMem_Rd != 0) && (EXMem_Rd == IDEX_Rs2); assign MemWBForwardA = MemWB_RegWrite && (MemWB_Rd != 0) && (MemWB_Rd == IDEX_Rs1); assign MemWBForwardB = MemWB_RegWrite && (MemWB_Rd != 0) && (MemWB_Rd == IDEX_Rs2); always @(*) begin // Forwarding for Rs1 if (EXMemForwardA) begin ForwardA = 2'b10; // Forward from EX/MEM end else if (MemWBForwardA && (~EXMemForwardA)) begin ForwardA = 2'b01; // Forward from MEM/WB end else begin ForwardA = 2'b00; // No forwarding end // Forwarding for Rs2 if (EXMemForwardB) begin ForwardB = 2'b10; // Forward from EX/MEM end else if (MemWBForwardB && (~EXMemForwardB)) begin ForwardB = 2'b01; // Forward from MEM/WB end else begin ForwardB = 2'b00; // No forwarding end end endmodule
- 改过的
EX
阶段代码:
module Pipeline_Ex(
input [31:0] PC_in_EX, //PC输入
input [31:0] Rs1_in_EX, //操作1输入
input [31:0] Rs2_in_EX, //操作2输入
input [31:0] Imm_in_EX, //立即数
input [3:0] WHBU_in_EX,
input [1:0] LSwea_in_EX,
input ALUSrc_B_in_EX, //ALU B选择
input [3:0] ALU_control_in_EX, //ALU选择控制
input [1:0] ForwardA,
input [1:0] ForwardB,
input [31:0] ALU_out_EXMem,
input [31:0] Data_out_WB,
output [31:0] PC_out_EX, //PC输出, PCAddImm
output [31:0] PC4_out_EX, //PC+4输出
output zero_out_EX, //ALU0输出
output [31:0] ALU_out_EX, //ALU计算输出
output [31:0] Rs2_out_EX, //操作2输出
output reg [31:0] StoreData_out_EX, //存储数据输出
output reg [3:0] wea_out_EX //存储控制输出
);
wire [31:0] ALUB;
assign PC_out_EX = PC_in_EX + Imm_in_EX; // PC_out = PC + Imm
assign PC4_out_EX = PC_in_EX + 32'd4;
reg [31:0] ForwardA_data, ForwardB_data;
always @(*) begin
case (ForwardA)
2'b00: ForwardA_data = Rs1_in_EX;
2'b01: ForwardA_data = Data_out_WB; // Load data / previous ALU result
2'b10: ForwardA_data = ALU_out_EXMem; // ALU result
default: ForwardA_data = 32'b0;
endcase
end
always @(*) begin
case (ForwardB)
2'b00: ForwardB_data = Rs2_in_EX;
2'b01: ForwardB_data = Data_out_WB;
2'b10: ForwardB_data = ALU_out_EXMem;
default: ForwardB_data = 32'b0;
endcase
end
assign Rs2_out_EX = ForwardB_data;
assign ALUB = (ALUSrc_B_in_EX) ? Imm_in_EX : ForwardB_data;
ALU alu(.A(ForwardA_data), .B(ALUB), .ALU_operation(ALU_control_in_EX), .res(ALU_out_EX),
.zero(zero_out_EX));
always @ (*) begin // 改在 Ex 阶段产生 StoreData
case (LSwea_in_EX)
2'b00: begin
StoreData_out_EX = Rs2_out_EX;
case (WHBU_in_EX)
4'b0010: wea_out_EX = 4'b0001;
4'b0100: wea_out_EX = 4'b0011;
4'b1000: wea_out_EX = 4'b1111;
default: wea_out_EX = 0;
endcase
end
2'b01: begin
StoreData_out_EX = {Rs2_out_EX[23:0], 8'b0};
case (WHBU_in_EX)
4'b0010: wea_out_EX = 4'b0010;
4'b0100: wea_out_EX = 4'b0110;
default: wea_out_EX = 0;
endcase
end
2'b10: begin
StoreData_out_EX = {Rs2_out_EX[15:0], 16'b0};
case (WHBU_in_EX)
4'b0010: wea_out_EX = 4'b0100;
4'b0100: wea_out_EX = 4'b1100;
default: wea_out_EX = 0;
endcase
end
2'b11: begin
StoreData_out_EX = {Rs2_out_EX[23:0], 24'b0};
case (WHBU_in_EX)
4'b0010: wea_out_EX = 4'b1000;
default: wea_out_EX = 0;
endcase
end
endcase
end
endmodule
改为在 EX
阶段产生 StoreData
, 并且接入了 ALU_out_EXMem
和 Data_out_WB
作为 Forwarding 的选择值
- 整体 CPU 代码:
module Pipeline_CPU( input clk, input rst, input [31:0] Data_in, // 数据输入 input [31:0] inst_IF, // 指令输入 output [31:0] Addr_out, // 地址输出 output [31:0] Data_out, // 数据输出 output [31:0] Data_out_WB, // 写回数据输出 output [31:0] PC_out_IF, // IF阶段PC输出 output [31:0] inst_ID, // ID阶段指令输出 output [31:0] PC_out_ID, // ID阶段PC输出 output [31:0] PC_out_EX, // EX阶段PC输出 output MemRW_EX, // EX阶段存储器读写 output MemRW_Mem, // MEM阶段存储器读写 output [3:0] wea, output [31:0] Reg00, output [31:0] Reg01,output [31:0] Reg02,output [31:0] Reg03, output [31:0] Reg04, output [31:0] Reg05,output [31:0] Reg06,output [31:0] Reg07, output [31:0] Reg08, output [31:0] Reg09,output [31:0] Reg10,output [31:0] Reg11, output [31:0] Reg12, output [31:0] Reg13,output [31:0] Reg14,output [31:0] Reg15, output [31:0] Reg16, output [31:0] Reg17,output [31:0] Reg18,output [31:0] Reg19, output [31:0] Reg20, output [31:0] Reg21,output [31:0] Reg22,output [31:0] Reg23, output [31:0] Reg24, output [31:0] Reg25,output [31:0] Reg26,output [31:0] Reg27, output [31:0] Reg28, output [31:0] Reg29,output [31:0] Reg30,output [31:0] Reg31 ); wire [31:0] PC_out_EXMem, ALU_out_EXMem; wire [4:0] Rd_addr_out_MemWB; wire RegWrite_out_MemWB; wire [2:0] PCSrc; wire PC_en, IFID_en, IDEX_Bubble; Pipeline_IF PPLIF ( .clk_IF(clk), .rst_IF(rst), .en_IF(PC_en), .PC_in_IF(PC_out_EXMem), .PCSrc(PCSrc), .ALU_in_IF(ALU_out_EXMem), .PC_out_IF(PC_out_IF) ); wire [31:0] PC_out_IFID, inst_out_IFID; wire IFID_flush, IDEX_flush; IF_reg_ID PPLIFID ( .clk_IFID(clk), .rst_IFID(rst), .flush(IFID_flush), .en_IFID(IFID_en), .PC_in_IFID(PC_out_IF), .inst_in_IFID(inst_IF), .PC_out_IFID(PC_out_IFID), .inst_out_IFID(inst_out_IFID) ); wire [31:0] Imm_out_ID, Rs1_out_ID, Rs2_out_ID; wire ALUSrc_B_ID, MemRW_ID, RegWrite_out_ID; wire [4:0] Rd_addr_out_ID; wire [3:0] ALU_control_ID; wire [2:0] Branch_ID; wire [1:0] Jump_ID; wire [2:0] MemtoReg_ID; wire [3:0] WHBU_ID; wire [1:0] LSwea_ID; wire [31:0] StoreData_ID; wire [3:0] wea_ID; Pipeline_ID PPLID ( .clk_ID(clk), .rst_ID(rst), .RegWrite_in_ID(RegWrite_out_MemWB), .Rd_addr_ID(Rd_addr_out_MemWB), .Wt_data_ID(Data_out_WB), .inst_in_ID(inst_out_IFID), .Rd_addr_out_ID(Rd_addr_out_ID), .Rs1_out_ID(Rs1_out_ID), .Rs2_out_ID(Rs2_out_ID), .Imm_out_ID(Imm_out_ID), .ALUSrc_B_ID(ALUSrc_B_ID), .ALU_control_ID(ALU_control_ID), .Branch_ID(Branch_ID), .MemRW_ID(MemRW_ID), .Jump_ID(Jump_ID), .MemtoReg_ID(MemtoReg_ID), .RegWrite_out_ID(RegWrite_out_ID), .Rs1_addr_out_ID(Rs1_addr_out_ID), .Rs2_addr_out_ID(Rs2_addr_out_ID), .WHBU_ID(WHBU_ID), .LSwea_ID(LSwea_ID), // .StoreData_ID(StoreData_ID), // .wea_ID(wea_ID), .Reg00(Reg00),.Reg01(Reg01),.Reg02(Reg02),.Reg03(Reg03), .Reg04(Reg04),.Reg05(Reg05),.Reg06(Reg06),.Reg07(Reg07), .Reg08(Reg08),.Reg09(Reg09),.Reg10(Reg10),.Reg11(Reg11), .Reg12(Reg12),.Reg13(Reg13),.Reg14(Reg14),.Reg15(Reg15), .Reg16(Reg16),.Reg17(Reg17),.Reg18(Reg18),.Reg19(Reg19), .Reg20(Reg20),.Reg21(Reg21),.Reg22(Reg22),.Reg23(Reg23), .Reg24(Reg24),.Reg25(Reg25),.Reg26(Reg26),.Reg27(Reg27), .Reg28(Reg28),.Reg29(Reg29),.Reg30(Reg30),.Reg31(Reg31) ); wire [31:0] PC_out_IDEX, Imm_out_IDEX, Rs1_out_IDEX, Rs2_out_IDEX; wire [4:0] Rd_addr_out_IDEX, Rs1_addr_out_ID, Rs2_addr_out_ID; wire ALUSrc_B_out_IDEX, MemRW_out_IDEX, RegWrite_out_IDEX; wire [3:0] ALU_control_out_IDEX; wire [2:0] Branch_out_IDEX; wire [1:0] Jump_out_IDEX; wire [2:0] MemtoReg_out_IDEX; wire [3:0] WHBU_out_IDEX; wire [1:0] LSwea_out_IDEX; wire [31:0] StoreData_out_IDEX; wire [4:0] Rs1_addr_out_IDEX, Rs2_addr_out_IDEX; wire [3:0] wea_out_IDEX; ID_reg_Ex PPLIDEx ( .clk_IDEX(clk), .rst_IDEX(rst), .en_IDEX(1'b1), .Bubble(IDEX_Bubble), .flush(IDEX_flush), .PC_in_IDEX(PC_out_IFID), .Rd_addr_IDEX(Rd_addr_out_ID), .Rs1_addr_IDEX(Rs1_addr_out_ID), .Rs2_addr_IDEX(Rs2_addr_out_ID), .Rs1_in_IDEX(Rs1_out_ID), .Rs2_in_IDEX(Rs2_out_ID), .Imm_in_IDEX(Imm_out_ID), .ALUSrc_B_in_IDEX(ALUSrc_B_ID), .ALU_control_in_IDEX(ALU_control_ID), .Branch_in_IDEX(Branch_ID), .Jump_in_IDEX(Jump_ID), .MemRW_in_IDEX(MemRW_ID), .MemtoReg_in_IDEX(MemtoReg_ID), .RegWrite_in_IDEX(RegWrite_out_ID), .WHBU_in_IDEX(WHBU_ID), .LSwea_in_IDEX(LSwea_ID), // .StoreData_in_IDEX(StoreData_ID), // .wea_in_IDEX(wea_ID), .PC_out_IDEX(PC_out_IDEX), .Rd_addr_out_IDEX(Rd_addr_out_IDEX), .Rs1_addr_out_IDEX(Rs1_addr_out_IDEX), .Rs2_addr_out_IDEX(Rs2_addr_out_IDEX), .Rs1_out_IDEX(Rs1_out_IDEX), .Rs2_out_IDEX(Rs2_out_IDEX), .Imm_out_IDEX(Imm_out_IDEX), .ALUSrc_B_out_IDEX(ALUSrc_B_out_IDEX), .ALU_control_out_IDEX(ALU_control_out_IDEX), .Branch_out_IDEX(Branch_out_IDEX), .Jump_out_IDEX(Jump_out_IDEX), .MemRW_out_IDEX(MemRW_out_IDEX), .MemtoReg_out_IDEX(MemtoReg_out_IDEX), .RegWrite_out_IDEX(RegWrite_out_IDEX), .WHBU_out_IDEX(WHBU_out_IDEX), .LSwea_out_IDEX(LSwea_out_IDEX) // .StoreData_out_IDEX(StoreData_out_IDEX), // .wea_out_IDEX(wea_out_IDEX) ); wire [31:0] PC4_out_EX, ALU_out_EX; wire [31:0] Rs2_out_EX; wire zero_out_EX; wire [1:0] ForwardA, ForwardB; // 新增 Forwarding 模块 ForwardingUnit PPLForwarding ( .IDEX_Rs1(Rs1_addr_out_IDEX), .IDEX_Rs2(Rs2_addr_out_IDEX), .EXMem_Rd(Rd_addr_out_EXMem), .MemWB_Rd(Rd_addr_out_MemWB), .EXMem_RegWrite(RegWrite_out_EXMem), .MemWB_RegWrite(RegWrite_out_MemWB), .ForwardA(ForwardA), .ForwardB(ForwardB) ); wire [31:0] StoreData_out_EX; wire [3:0] wea_out_EX; // 改为在 Ex 产生 StoreData Pipeline_Ex PPLEx ( .PC_in_EX(PC_out_IDEX), .Rs1_in_EX(Rs1_out_IDEX), .Rs2_in_EX(Rs2_out_IDEX), .Imm_in_EX(Imm_out_IDEX), .WHBU_in_EX(WHBU_out_IDEX), .LSwea_in_EX(LSwea_out_IDEX), .ALUSrc_B_in_EX(ALUSrc_B_out_IDEX), .ALU_control_in_EX(ALU_control_out_IDEX), .ForwardA(ForwardA), .ForwardB(ForwardB), .ALU_out_EXMem(ALU_out_EXMem), .Data_out_WB(Data_out_WB), .PC_out_EX(PC_out_EX), .PC4_out_EX(PC4_out_EX), .zero_out_EX(zero_out_EX), .ALU_out_EX(ALU_out_EX), .Rs2_out_EX(Rs2_out_EX), .StoreData_out_EX(StoreData_out_EX), .wea_out_EX(wea_out_EX) ); wire [31:0] PC4_out_EXMem, Imm_out_EXMem, Rs2_out_EXMem; wire [4:0] Rd_addr_out_EXMem; wire zero_out_EXMem, MemRW_out_EXMem, RegWrite_out_EXMem; wire [2:0] Branch_out_EXMem; wire [1:0] Jump_out_EXMem; wire [2:0] MemtoReg_out_EXMem; wire [3:0] WHBU_out_EXMem; wire [1:0] LSwea_out_EXMem; wire [31:0] StoreData_out_EXMem; wire [3:0] wea_out_EXMem; Ex_reg_Mem PPLExMem ( .clk_EXMem(clk), .rst_EXMem(rst), .en_EXMem(1'b1), .PC_in_EXMem(PC_out_EX), .Imm_in_EXMem(Imm_out_IDEX), .PC4_in_EXMem(PC4_out_EX), .Rd_addr_EXMem(Rd_addr_out_IDEX), .zero_in_EXMem(zero_out_EX), .ALU_in_EXMem(ALU_out_EX), .Rs2_in_EXMem(Rs2_out_EX), .Branch_in_EXMem(Branch_out_IDEX), .Jump_in_EXMem(Jump_out_IDEX), .MemRW_in_EXMem(MemRW_out_IDEX), .MemtoReg_in_EXMem(MemtoReg_out_IDEX), .RegWrite_in_EXMem(RegWrite_out_IDEX), .WHBU_in_EXMem(WHBU_out_IDEX), .LSwea_in_EXMem(LSwea_out_IDEX), .StoreData_in_EXMem(StoreData_out_EX), .wea_in_EXMem(wea_out_EX), .PC_out_EXMem(PC_out_EXMem), .Imm_out_EXMem(Imm_out_EXMem), .PC4_out_EXMem(PC4_out_EXMem), .Rd_addr_out_EXMem(Rd_addr_out_EXMem), .zero_out_EXMem(zero_out_EXMem), .ALU_out_EXMem(ALU_out_EXMem), .Rs2_out_EXMem(Rs2_out_EXMem), .Branch_out_EXMem(Branch_out_EXMem), .Jump_out_EXMem(Jump_out_EXMem), .MemRW_out_EXMem(MemRW_out_EXMem), .MemtoReg_out_EXMem(MemtoReg_out_EXMem), .RegWrite_out_EXMem(RegWrite_out_EXMem), .WHBU_out_EXMem(WHBU_out_EXMem), .LSwea_out_EXMem(LSwea_out_EXMem), .StoreData_out_EXMem(StoreData_out_EXMem), .wea_out_EXMem(wea_out_EXMem) ); Pipeline_Mem PPLMem ( .zero_in_Mem(zero_out_EXMem), .res_in_Mem(ALU_out_EXMem), .Branch_in_Mem(Branch_out_EXMem), .Jump_in_Mem(Jump_out_EXMem), .PCSrc(PCSrc) ); wire [31:0] PC_out_MemWB, Imm_out_MemWB, PC4_out_MemWB, ALU_out_MemWB, Dmem_data_out_MemWB; wire [2:0] MemtoReg_out_MemWB; //wire RegWrite_out_MemWB; wire [3:0] WHBU_out_MemWB; wire [1:0] LSwea_out_MemWB; Mem_reg_WB PPLMemWB ( .clk_MemWB(clk), .rst_MemWB(rst), .en_MemWB(1'b1), .PC_in_MemWB(PC_out_EXMem), .Imm_in_MemWB(Imm_out_EXMem), .PC4_in_MemWB(PC4_out_EXMem), .Rd_addr_MemWB(Rd_addr_out_EXMem), .ALU_in_MemWB(ALU_out_EXMem), .Dmem_data_MemWB(Data_in), .MemtoReg_in_MemWB(MemtoReg_out_EXMem), .RegWrite_in_MemWB(RegWrite_out_EXMem), .WHBU_in_MemWB(WHBU_out_EXMem), .LSwea_in_MemWB(LSwea_out_EXMem), .PC_out_MemWB(PC_out_MemWB), .Imm_out_MemWB(Imm_out_MemWB), .PC4_out_MemWB(PC4_out_MemWB), .Rd_addr_out_MemWB(Rd_addr_out_MemWB), .ALU_out_MemWB(ALU_out_MemWB), .Dmem_data_out_MemWB(Dmem_data_out_MemWB), .MemtoReg_out_MemWB(MemtoReg_out_MemWB), .RegWrite_out_MemWB(RegWrite_out_MemWB), .WHBU_out_MemWB(WHBU_out_MemWB), .LSwea_out_MemWB(LSwea_out_MemWB) ); Pipeline_WB PPLWB ( .PC4_in_WB(PC4_out_MemWB), .ALU_in_WB(ALU_out_MemWB), .Dmem_data_WB(Dmem_data_out_MemWB), .Imm_in_WB(Imm_out_MemWB), .PC_in_WB(PC_out_MemWB), .MemtoReg_in_WB(MemtoReg_out_MemWB), .WHBU_in_WB(WHBU_out_MemWB), .LSwea_in_WB(LSwea_out_MemWB), .Data_out_WB(Data_out_WB) ); assign MemRW_EX = MemRW_out_IDEX; assign MemRW_Mem = MemRW_out_EXMem; assign Addr_out = ALU_out_EXMem; assign Data_out = StoreData_out_EXMem; assign wea = wea_out_EXMem; assign PC_out_ID = PC_out_IFID; assign inst_ID = inst_out_IFID; //新增 stall 模块 stall_control sc(.ID_rs1(Rs1_addr_out_ID), .ID_rs2(Rs2_addr_out_ID), .IDEX_rd(Rd_addr_out_IDEX), .EXMem_rd(Rd_addr_out_EXMem), .MemWB_rd(Rd_addr_out_MemWB), .EXMem_RegWrite(RegWrite_out_EXMem), .IDEX_RegWrite(RegWrite_out_IDEX), .MemWB_RegWrite(RegWrite_out_MemWB), .IDEX_MemtoReg(MemtoReg_out_IDEX), .IDEX_Branch(Branch_out_IDEX), .EXMem_Branch(Branch_out_EXMem), .IDEX_Jump(Jump_out_IDEX), .EXMem_Jump(Jump_out_EXMem), .Mem_PCSrc(PCSrc), .PC_en(PC_en), .IFID_en(IFID_en), .IDEX_Bubble(IDEX_Bubble), .IFID_flush(IFID_flush), .IDEX_flush(IDEX_flush) ); endmodule
仿真关键步骤说明
- 仿真汇编
auipc x1, 0
j start # 00
dummy:
nop # 04
nop # 08
nop # 0C
nop # 10
nop # 14
nop # 18
nop # 1C
j dummy
start:
bnez x1, dummy
beq x0, x0, pass_0
li x31, 0
auipc x30, 0
j dummy
pass_0:
li x31, 1
bne x0, x0, dummy
bltu x0, x0, dummy
li x1, -1 # x1=FFFFFFFF
xori x3, x1, 1 # x3=FFFFFFFE
add x3, x3, x3 # x3=FFFFFFFC
add x3, x3, x3 # x3=FFFFFFF8
add x3, x3, x3 # x3=FFFFFFF0
add x3, x3, x3 # x3=FFFFFFE0
add x3, x3, x3 # x3=FFFFFFC0
add x3, x3, x3 # x3=FFFFFF80
add x3, x3, x3 # x3=FFFFFF00
add x3, x3, x3 # x3=FFFFFE00
add x3, x3, x3 # x3=FFFFFC00
add x3, x3, x3 # x3=FFFFF800
add x3, x3, x3 # x3=FFFFF000
add x3, x3, x3 # x3=FFFFE000
add x3, x3, x3 # x3=FFFFC000
add x3, x3, x3 # x3=FFFF8000
add x3, x3, x3 # x3=FFFF0000
add x3, x3, x3 # x3=FFFE0000
add x3, x3, x3 # x3=FFFC0000
add x3, x3, x3 # x3=FFF80000
add x3, x3, x3 # x3=FFF00000
add x3, x3, x3 # x3=FFE00000
add x3, x3, x3 # x3=FFC00000
add x3, x3, x3 # x3=FF800000
add x3, x3, x3 # x3=FF000000
add x3, x3, x3 # x3=FE000000
add x3, x3, x3 # x3=FC000000
add x5, x3, x3 # x5=F8000000
add x3, x5, x5 # x3=F0000000
add x4, x3, x3 # x4=E0000000
add x6, x4, x4 # x6=C0000000
add x7, x6, x6 # x7=80000000
ori x8, zero, 1 # x8=00000001
ori x28, zero, 31
srl x29, x7, x28 # x29=00000001
auipc x30, 0
bne x8, x29, dummy
auipc x30, 0
blt x8, x7, dummy
sra x29, x7, x28 # x29=FFFFFFFF
and x29, x29, x3 # x29=x3=F0000000
auipc x30, 0
bne x3, x29, dummy
mv x29, x8 # x29=x8=00000001
bltu x29, x7, pass_1 # unsigned 00000001 < 80000000
auipc x30, 0
j dummy
pass_1:
nop
li x31, 2
sub x3, x6, x7 # x3=40000000
sub x4, x7, x3 # x4=40000000
slti x9, x0, 1 # x9=00000001
slt x10, x3, x4
slt x10, x4, x3 # x10=00000000
auipc x30, 0
beq x9, x10, dummy # branch when x3 != x4
srli x29, x3, 30 # x29=00000001
beq x29, x9, pass_2
auipc x30, 0
j dummy
pass_2:
nop
# Test set-less-than
li x31, 3
slti x10, x1, 3 # x10=00000001
slt x11, x5, x1 # signed(0xF8000000) < -1
# x11=00000001
slt x12, x1, x3 # x12=00000001
andi x10, x10, 0xff
and x10, x10, x11
and x10, x10, x12 # x10=00000001
auipc x30, 0
beqz x10, dummy
sltu x10, x1, x8 # unsigned FFFFFFFF < 00000001 ?
auipc x30, 0
bnez x10, dummy
sltu x10, x8, x3 # unsigned 00000001 < F0000000 ?
auipc x30, 0
beqz x10, dummy
sltiu x10, x1, 3
auipc x30, 0
bnez x10, dummy
li x11, 1
bne x10, x11, pass_3
auipc x30, 0
j dummy
pass_3:
nop
li x31, 4
or x11, x7, x3 # x11=C0000000
beq x11, x6, pass_4
auipc x30, 0
j dummy
pass_4:
nop
li x31, 5
li x18, 0x20 # base addr=00000020
### uncomment instr. below when simulating on venus
# lui x18, 0x10000 # base addr=10000000
sw x5, 0(x18) # mem[0x20]=F8000000
sw x4, 4(x18) # mem[0x24]=40000000
lw x27, 0(x18) # x27=mem[0x20]=F8000000
xor x27, x27, x5 # x27=00000000
sw x6, 0(x18) # mem[0x20]=C0000000
lw x28, 0(x18) # x28=mem[0x20]=C0000000
xor x27, x6, x28 # x27=00000000
auipc x30, 0
bnez x27, dummy
lui x20, 0xA0000 # x20=A0000000
sw x20, 8(x18) # mem[0x28]=A0000000
lui x27, 0xFEDCB # x27=FEDCB000
srai x27, x27, 12 # x27=FFFFEDCB
li x28, 8
sll x27, x27, x28 # x27=FFEDCB00
ori x27, x27, 0xff # x27=FFEDCBFF
lb x29, 11(x18) # x29=FFFFFFA0, little-endian, signed-ext
and x27, x27, x29 # x27=FFEDCBA0
sw x27, 8(x18) # mem[0x28]=FFEDCBA0
lhu x27, 8(x18) # x27=0000CBA0
lui x20, 0xFFFF0 # x20=FFFF0000
and x20, x20, x27 # x20=00000000
auipc x30, 0
bnez x20, dummy # check unsigned-ext
li x31, 6
lbu x28, 10(x18) # x28=000000ED
lbu x29, 11(x18) # x29=000000FF
slli x29, x29, 8 # x29=0000FF00
or x29, x29, x28 # x29=0000FFED
slli x29, x29, 16
or x29, x27, x29 # x29=FFEDCBA0
lw x28, 8(x18) # x28=FFEDCBA0
auipc x30, 0
bne x28, x29, dummy
sw x0, 0(x18) # mem[0x20]=00000000
sh x27, 0(x18) # mem[0x20]=0000CBA0
li x28, 0xD0
sb x28, 2(x18) # mem[0x20]=00D0CBA0
lw x28, 0(x18) # x28=00D0CBA0
li x29, 0x00D0CBA0
auipc x30, 0
bne x28, x29, dummy
lh x27, 2(x18) # x27=000000D0
li x28, 0xD0
auipc x30, 0
bne x27, x28, dummy
pass_5:
li x31, 7
auipc x30, 0
bge x1, x0, dummy # -1 >= 0 ?
bge x8, x1, pass_6 # 1 >= -1 ?
auipc x30, 0
j dummy
pass_6:
auipc x30, 0
bgeu x0, x1, dummy # 0 >= FFFFFFFF ?
auipc x30, 0
bgeu x8, x1, dummy
auipc x20, 0
jalr x21, x0, pass_7 # just for test : (
auipc x30, 0
j dummy
pass_7:
# original test ends here
addi x20, x20, 8
auipc x30, 0
bne x20, x21, dummy
pass_8:
li x31, 8
addi x1, x0, 1
lui x7, 1
addi x2, x1, 2
lui x7, 1
beq x1, x0, dummy
addi x3, x2, 3
lui x7, 1
sw x3, 0(x0)
lui x7, 1
beq x3, x0, dummy
lw x4, 0(x0)
lui x7, 1
addi x5, x4, 4
lui x7, 1
beq x5, x0, dummy
addi x6, x5, 5
lui x7, 1
addi x7, x7, 15
addi x8, x0, 1
slli x8, x8, 12
sub x7, x7, x8
bne x7, x6, dummy
pass_9:
li x31, 9
addi x0, x1, 1
sub x1, x1, x1
bne x0, x1, dummy
pass_10:
li x31, 10
lui x1, 233
sw x1, 0(x0)
lw x2, 0(x0)
bne x1, x2, dummy
pass_11:
li x31, 11
addi x1, x0, 233
beq x0, x1, dummy
addi x1, x0, -1
blt x0, x1, dummy
bltu x1, x0, dummy
passed:
li x31, 0x666
j dummy
该汇编代码为在 Lab 4-3 的基础上改编而来, 在原有的测试点 pass_7
后新增了 pass_8
到 pass_11
\(4\) 个测试点
pass_9
主要测试普通 Use-Use Hazard
pass_10
主要测试 Load-Use, Use-Store, Lui-Use Hazard
pass_11
主要测试 Control Hazard
pass_8
是综合上面的所有情况, 将所有类型指令混合在一起而成的一大段代码, 包含了上面的所有冲突
具体为, 在 Use 与 Use 中间穿插进多条 lw, sw, lui
指令, 同时还穿插了跳转指令
- 仿真代码
module sim();
reg clk;
reg rst;
testbench m0(.clk(clk), .rst(rst));
initial begin
clk = 1'b0;
rst = 1'b1;
#50;
rst = 1'b0;
end
always #10 clk = ~clk;
endmodule
实验结果与分析
仿真结果
- 运行仿真后
x31
寄存器首先正常改为8
, 说明基本的程序运行没有问题, 通过了 Lba 4-3 的基本仿真代码
-
之后
x31
寄存器正常改为9
, 说明8
号测试点通过, 符合预期下面来具体分析
9, 10, 11
号测试点 -
对于
9
号测试点, 首先x1 = 1
, 之后x1 = x1 - x1
, 即用自己减去自己, 最后判断结果是否为0
可以看到, 图中
x1
被改为0
, 因此通过了测试点9
-
对于
10
号测试点, 首先lui x1, 233
, 之后先将x1
的值存入mem[0]
, 再将mem[0]
的值读入x2
, 最后判断是否x1 = x2
这个过程中出现了 Lui-Use, Use-Store 冲突
可以看到, 图中
x1
先被改为000e9000
, 这对应了233
的 \(16\) 进制表示; 然后看到x2 = 000e9000
, 因此通过了测试点10
-
对于
11
号测试点, 首先x1 = 233
, 后跟一句含有x1
的跳转指令; 之后先将x1
的值设为-1
, 后跟两句含有x1
的跳转指令这个过程中出现了控制冲突
可以看到, 图中
x1
先被改为000000e9
, 这对应了233
的 \(16\) 进制表示; 然后看到x1 = ffffffff
, 最后通过了测试点11
-
最终
x31 = 666
, 说明通过了仿真测试
下板结果
思考题
基于你完成的流水线,对于以下两段代码分别分析:不同指令之间是否存在冲突(如果有,请逐条列出)、在你的流水线上运行的 CPI 为何。
TP-0
:addi x1, x0, 0 addi x2, x0, -1 addi x3, x0, 1 addi x4, x0, -1 addi x5, x0, 1 addi x6, x0, -1 addi x1, x1, 0 addi x2, x2, 1 addi x3, x3, -1 addi x4, x4, 1 addi x5, x5, -1 addi x6, x6, 1
TP-1
:
verilog addi x1, x0, 1 addi x2, x1, 2 addi x3, x1, 3 addi x4, x3, 4
-
对于
TP-0
, 指令之间没有冲突, 因为没有上下两条指令的互相依赖对于
addi xi, xi, c
指令, 在ID
阶段取出xi
中的值,EX
阶段与立即数相加, 并在WB
阶段写回, 期间没有发生冲突所以 \(Cycles = 12+5-1=16,Instruction=12\)
\(CPI = 16/12=1.33\)
-
对于
TP-1
, 第一条与第二条指令间有冲突, 使用 Forwarding 解决;
第一条与第三条指令间有冲突, 使用 Forwarding 解决;
第三条与第四条指令间有冲突, 使用 Forwarding 解决;
所以 \(Cycles = 4+5-1=8,Instruction=4\)
$CPI = 8/4=2$
请根据你的实现,在 testbench 上仿真以下代码,给出仿真结果,并写出完成所有指令用了多少拍,必须给出的信号有 clk, IF-PC, ID-PC 以及所有用到的寄存器值。请务必注意调整数制为十六进制,缩放能够看到所有信号值
addi x1, x0, 1 addi x2, x1, 2 addi x3, x2, 3 sw x3, 0(x0) lw x4, 0(x0) addi x5, x4, 4 addi x6, x4, 5
仿真结果如下:
一共用了 \(12\) 拍, 其中第 \(7\) 拍 stall 了 \(1\) 拍, 因为遇到了 Load-Use 冲突
之后由于 Forwarding, 没有更多的 stall, 所以一共用了 \(5+7-1+1=12\) 拍
Cache
操作方法与实验步骤
代码设计层次结构图及说明
组相联映射方式
- 主存和cache按同样大小划分成块。
- 主存和cache按同样大小划分成组。
- 主存容量是cache容量的整数倍,将主存空间按cache区的大小分成区,主存中每一区的组数与cache的组数相同。
- 当主存的数据调入cache时,主存与cache的组号应相等,也就是各区中的某一块只能存入cache的同组号的空间内,但组内各块地址之间则可以任意存放, 即从主存的组到cache的组之间采用直接映射方式;在两个对应的组内部采用全相联映射方式。
LRU 替换算法
依据各块使用的情况,总是选择那个最近最少使用的块被替换。这种方法比较好地反映了程序局部性规律,命中率最高。
Write Back
方法:在CPU执行写操作时,只写入cache,不写入主存, 在 miss 时将 dirty 的数据块写回
优点:速度较高
缺点:可靠性较差,控制操作比较复杂
Allocate
miss 时将 memory 中数据搬到 cache 中
源代码
1. Data_ram 代码:
module Data_ram(
input wire clk, // clock
input wire en, // enable
input wire rst, // reset
input wire [6:0] addr, // address
input wire [127:0] din, // data write in
output wire [127:0] dout // data read out
// input wire [Index_width-1:0] addr, // address
// input wire [Block_width-1:0] din, // data write in
// output wire [Block_width-1:0] dout // data read out
);
parameter NUM_of_sets = 128;
parameter Block_width = 128;
parameter Index_width = 7;
//cache line memory: data for way0
reg [Block_width-1:0] cache_data [0:NUM_of_sets-1];
//Read and Write data to Cache
integer i;
always @(posedge clk or posedge rst) begin
if(rst) begin
for(i=0; i<NUM_of_sets; i=i+1) begin
cache_data[i] <= 128'b0;
end
end
else begin
if(en) begin
cache_data[addr] <= din;
end
else begin
cache_data[addr] <= cache_data[addr];
end
end
end
assign dout = cache_data[addr];
endmodule
在 Data_ram
模块中, 实现了 \(128\) 个长度为 \(128\) 的寄存器,用于保存 cache 中的数据, 可以存储 \(4\) 个 word
当 rst = 1
时,寄存器清零
当 en = 1
时允许写入数据,否则不能写入
2. Tag_ram 代码:
module Tag_ram(
input wire clk, // clock
input wire en, // enable
input wire rst, // reset
input wire [6:0] addr, // address
input wire [25:0] din, // data write in
output wire [25:0] dout // data read out
);
parameter NUM_of_sets = 128;
parameter Index_width = 7;
parameter TAG_width = 23;
parameter V = 1;
parameter U = 1;
parameter D = 1;
//cache line memory: tag,V,U,D for way0
reg [TAG_width+V+U+D-1:0] cache_TAG [0:NUM_of_sets-1];
integer i;
always @(posedge clk or posedge rst) begin
if(rst) begin
if(rst) begin
for(i=0; i<NUM_of_sets; i=i+1) begin
cache_TAG[i] <= 128'b0;
end
end
end
else begin
if(en) begin
cache_TAG[addr] <= din;
end
else begin
cache_TAG[addr] <= cache_TAG[addr];
end
end
//Read and Write TAG to Cache
end
assign dout = cache_TAG[addr];
endmodule
在 Tag_ram
模块中, 实现了 \(128\) 个长度为 \(26\) 的寄存器,用于保存 cache 中的标签与有效标记, 最高位存储 V
, 表示是否有效; 第二位 U
表示是否最近使用; D
表示块中的数据是否被改变过, 即是否是脏位; 最后 \(23\) 位存储标签
当 rst = 1
时,寄存器清零
当 en = 1
时允许写入数据,否则不能写入
3. Cache 代码:
module cache(
input wire clk, // clock
input wire rst, // reset
input wire [31:0] data_cpu_write, // data write in
input wire [31:0] data_mem_read, // data read in
input wire [31:0] addr_cpu, // cpu addr
input wire wr_cpu, // cpu write enable
input wire rd_cpu, // cpu read enable
input wire ready_mem, // memory ready
output reg wr_mem, // memory write enable // write back
output reg rd_mem, // memory read enable
output reg [31:0] data_mem_write, // data to mem // write back
output reg [31:0] data_cpu_read, // data to cpu
output reg [31:0] addr_mem // memory addr
);
parameter NUM_of_sets = 128;
parameter Block_width = 128;
parameter Index_width = 7;
parameter TAG_width = 23;
parameter tag_width = 26;
parameter V = 1;
parameter U = 1;
parameter D = 1;
parameter IDLE = 0;
parameter CompareTag = 1;
parameter Allocate = 2;
parameter WriteBack = 3;
// Cache Controller State Machine and Logic
reg [1:0] state;
reg [1:0] next_state;
reg [TAG_width+V+U+D-1:0] wtag0, wtag1;
reg ent0, ent1, en0, en1;
wire [TAG_width+V+U+D-1:0] rtag0, rtag1;
reg [Block_width-1:0] wdata;
reg [Block_width-1:0] wdata_hit, rdata_hit;
reg [Block_width-1:0] wdata_miss;
wire [Block_width-1:0] rdata0, rdata1, rdata;
wire [TAG_width-1:0] tag, tag0, tag1;
wire [Index_width-1:0] index;
wire [1:0] offset;
wire hit, hit0, hit1;
wire valid0, valid1;
wire cpu_req_valid;
assign cpu_req_valid = (wr_cpu || rd_cpu);
assign tag = addr_cpu[31:9];
assign index = addr_cpu[8:2];
assign offset = addr_cpu[1:0];
assign valid0 = rtag0[TAG_width+V+U+D-1];
assign valid1 = rtag1[TAG_width+V+U+D-1];
assign tag0 = rtag0[TAG_width-1:0];
assign tag1 = rtag1[TAG_width-1:0];
assign hit0 = ((tag0 == tag) && valid0);
assign hit1 = ((tag1 == tag) && valid1);
assign hit = (hit0 || hit1);
assign rdata = hit0 ? rdata0 : (hit1 ? rdata1 : 128'b0);
reg [2:0] mem_ready, next_mem_ready;
reg [Block_width-1:0] mem_data;
wire dirty;
assign dirty = (rtag0[24:23] == 2'b01) || (rtag1[24:23] == 2'b01);
always@(*) begin
case(offset)
2'd0: rdata_hit = rdata[31:0];
2'd1: rdata_hit = rdata[63:32];
2'd2: rdata_hit = rdata[95:64];
2'd3: rdata_hit = rdata[127:96];
endcase
case(offset)
2'd0: wdata_hit = {rdata[127:32], data_cpu_write};
2'd1: wdata_hit = {rdata[127:64], data_cpu_write, rdata[31:0]};
2'd2: wdata_hit = {rdata[127:96], data_cpu_write, rdata[63:0]};
2'd3: wdata_hit = {data_cpu_write, rdata[95:0]};
endcase
end
Data_ram d0 (
.clk(clk),
.rst(rst),
.addr(index),
.din(wdata),
.en(en0),
.dout(rdata0)
);
Data_ram d1 (
.clk(clk),
.rst(rst),
.addr(index),
.din(wdata),
.en(en1),
.dout(rdata1)
);
Tag_ram t0 (
.clk(clk),
.rst(rst),
.addr(index),
.din(wtag0),
.en(ent0),
.dout(rtag0)
);
Tag_ram t1 (
.clk(clk),
.rst(rst),
.addr(index),
.din(wtag1),
.en(ent1),
.dout(rtag1)
);
always@(posedge clk or posedge rst) begin
if(rst) begin
state <= IDLE;
mem_ready <= 3'd0;
end
else begin
state <= next_state;
mem_ready <= next_mem_ready;
end
end
always@(*) begin
case(state)
IDLE: begin
next_mem_ready = 3'd0;
en0 = 1'b0;
en1 = 1'b0;
ent0 = 1'b0;
ent1 = 1'b0;
wtag0 = 0;
wtag1 = 0;
wr_mem = 0;
rd_mem = 0;
data_mem_write = 0;
//data_cpu_read = 0;
addr_mem = 0;
if(cpu_req_valid) begin
next_state = CompareTag;
end
else begin
next_state = IDLE;
end
end
CompareTag: begin
mem_ready = 3'd0;
if(hit) begin
next_state = IDLE;
ent0 = 1'b1;
ent1 = 1'b1;
if(wr_cpu) begin // write hit
wdata = wdata_hit;
if(hit0) begin
en0 = 1'b1;
en1 = 1'b0;
wtag0 = {3'b111, rtag0[22:0]}; // recently used, set dirty
wtag1 = {rtag1[25], 1'b0, rtag1[23:0]}; // not recently used
end
else begin
en0 = 1'b0;
en1 = 1'b1;
wtag0 = {rtag0[25], 1'b0, rtag0[23:0]}; // not recently used
wtag1 = {3'b111, rtag1[22:0]}; // recently used, set dirty
end
end
else begin // read hit
data_cpu_read = rdata_hit;
en0 = 1'd0;
en1 = 1'd0;
wtag0 = {rtag0[25], hit0, rtag0[23:0]};
wtag1 = {rtag1[25], hit1, rtag1[23:0]};
end
end
else begin
if(dirty) begin
next_state = WriteBack;
end
else begin
next_state = Allocate;
end
end
end
Allocate: begin // read/write miss
if(mem_ready >= 3'd4) begin
rd_mem = 0;
next_mem_ready = 3'd0;
next_state = CompareTag;
if(rtag0[24] == 1'b0) begin // not recently used
en0 = 1'b1;
ent0 = 1'b1;
en1 = 1'b0;
ent1 = 1'b0;
wtag0 = {3'b110, tag}; // recently used, set clean
wdata = mem_data;
end
else begin
en0 = 1'b0;
ent0 = 1'b0;
en1 = 1'b1;
ent1 = 1'b1;
wtag1 = {3'b110, tag}; // recently used, set clean
wdata = mem_data;
end
end
else begin // get data from memory
en0 = 1'd0;
en1 = 1'd0;
ent0 = 1'd0;
ent1 = 1'd0;
rd_mem = 1;
next_state = Allocate;
if(ready_mem)
next_mem_ready = mem_ready + 1;
else
next_mem_ready = mem_ready;
case(mem_ready)
3'd0: mem_data[31:0] = data_mem_read;
3'd1: mem_data[63:32] = data_mem_read;
3'd2: mem_data[95:64] = data_mem_read;
3'd3: mem_data[127:96] = data_mem_read;
endcase
end
end
WriteBack: begin // carry dirty data to memory
if(mem_ready >= 3'd4) begin
wr_mem = 0;
next_mem_ready = 3'd0;
en0 = 1'd0;
en1 = 1'd0;
ent0 = 1'd0;
ent1 = 1'd0;
next_state = Allocate;
end
else begin
wr_mem = 1;
next_state = WriteBack;
if(ready_mem)
next_mem_ready = mem_ready + 1;
else
next_mem_ready = mem_ready;
if(rtag0[24] == 1'b0) begin // not recently used
en0 = 1'b1;
ent0 = 1'b1;
en1 = 1'b0;
ent1 = 1'b0;
case(mem_ready)
3'd0: begin addr_mem = {rtag0, index, 2'b00}; data_mem_write = rdata0[31:0]; end
3'd1: begin addr_mem = {rtag0, index, 2'b01}; data_mem_write = rdata0[63:32]; end
3'd2: begin addr_mem = {rtag0, index, 2'b10}; data_mem_write = rdata0[95:64]; end
3'd3: begin addr_mem = {rtag0, index, 2'b11}; data_mem_write = rdata0[127:96]; end
endcase
end
else begin
en0 = 1'b0;
ent0 = 1'b0;
en1 = 1'b1;
ent1 = 1'b1;
case(mem_ready)
3'd0: begin addr_mem = {rtag1, index, 2'b00}; data_mem_write = rdata1[31:0]; end
3'd1: begin addr_mem = {rtag1, index, 2'b01}; data_mem_write = rdata1[63:32]; end
3'd2: begin addr_mem = {rtag1, index, 2'b10}; data_mem_write = rdata1[95:64]; end
3'd3: begin addr_mem = {rtag1, index, 2'b11}; data_mem_write = rdata1[127:96]; end
endcase
end
end
end
default: next_state = IDLE;
endcase
end
endmodule
在 cache
模块中实现了 \(4\) 个 ram
模块,存储两组数据与标签; 同时实现了 cache controller
的有限状态机:
当产生了来自 cpu
的读写信号时, 从 IDLE
进入 CompareTag
状态,比较下标为 index
的组中是否有相同的 tag
, 如果有则 hit
, 那么就返回 IDLE
状态,更改V, U
标记位; 并且如果是写,就更改 Dirty
标记
如果是 miss
,就需要将 memory 中的数据拿到 cache 中
如果当前这个 block 已经满了, 就需要替换掉最近没使用的那一块
如果这一块数据被更改过,需要先将原本的更改过的数据存回 memory,即 write back
再将数据从 memory 搬到 cache 中,即 allocate
搬完数据后重新比较标签, 此时变为 hit
, 按照 hit
的方式处理即可
仿真关键步骤说明
1. testbench 代码:
module cache_tb();
reg clk; // clock
reg rst; // reset
reg [31:0] data_cpu_write; // data write in
reg [31:0] data_mem_read; // data read in
reg [31:0] addr_cpu; // cpu addr
reg wr_cpu; // cpu write enable
reg rd_cpu; // cpu read enable
reg ready_mem; // memory ready
wire wr_mem; // memory write enable // write back
wire rd_mem; // memory read enable
wire [31:0] data_mem_write; // data to mem // write back
wire [31:0] data_cpu_read; // data to cpu
wire [31:0] addr_mem; // memory addr
cache c(.clk(clk), .rst(rst), .data_cpu_write(data_cpu_write),
.data_mem_read(data_mem_read), .addr_cpu(addr_cpu),
.wr_cpu(wr_cpu), .rd_cpu(rd_cpu), .ready_mem(ready_mem),
.wr_mem(wr_mem), .rd_mem(rd_mem), .data_mem_write(data_mem_write),
.data_cpu_read(data_cpu_read), .addr_mem(addr_mem)
);
initial begin
wr_cpu = 0;
rd_cpu = 0;
clk = 1;
rst = 1;
ready_mem = 1;
#60;
rst = 0;
#40;
//write miss
wr_cpu = 1'd1;
addr_cpu = 32'h00000207;
data_cpu_write = 32'h19198100;
#20;
#20;
data_mem_read = 32'habababab;
#20;
data_mem_read = 32'hcdcdcdcd;
#20;
data_mem_read = 32'h12345678;
#20;
data_mem_read = 32'h11451400;
#200;
wr_cpu = 1'd0;
//read hit
rd_cpu = 1'd1;
addr_cpu = 32'h00000207;
#100;
rd_cpu = 1'd0;
//write hit
wr_cpu = 1'd1;
addr_cpu = 32'h00000207;
data_cpu_write = 32'hdeadbeef;
#120;
wr_cpu = 0;
//read miss
rd_cpu = 1'd1;
addr_cpu = 32'h0000020A;
#20;
#20;
data_mem_read = 32'haaaaaaaa;
#20;
data_mem_read = 32'hbbbbbbbb;
#20;
data_mem_read = 32'hcccccccc;
#20;
data_mem_read = 32'hdddddddd;
#40;
rd_cpu = 1'd0;
#100;
//read hit
rd_cpu = 1'd1;
addr_cpu = 32'h00000208;
#40;
rd_cpu = 1'd0;
#100;
end
always #10 clk = ~clk;
endmodule
对于 read/write, hit/miss 分别进行仿真
实验结果与分析
仿真结果
-
初始 cache 为空
之后向
207
写入值19198100
, 由于原本207
为空,所以需要与 memory 交互 \(4\) 个周期, 获得 memory 中对应的 \(4\) 个 word再将 \(4\) 个 word 写入 cache 中,完成 write back 阶段
-
之后再次对比标签,发现
hit
, 所以将对应数据块中数据更改,并标记dirty = 1
最后数据块中的
11451400
被改为19198100
3. 之后读取207
地址的第 \(4\) 个 word, 读取到了数据19198100
, 说明读hit
4. 之后将207
数据改成deadbeef
, 出现了写hit
5. 之后换到地址20A
并读取,出现读miss
, 所以与内存交互 \(4\) 周期, 将数据写入 cache
6. 之后重新进入CompareTag
阶段,看到读hit
, 读出了数据cccccccc
7. 再读208
地址, 也出现读hit
, 读出数据aaaaaaaa
8. 再写入10000207
地址,tag
改变, 重新allocate
, 读取 \(4\) 个 word 数据 9. 之后写hit
, 并且把第 \(0\) 组的最近使用位U = 0
, 把第 \(1\) 组的最近使用位U = 1, D = 1
, 并写入数据20241225
10. 再写入20000207
地址, 出现miss
, 由于 \(0, 1\) 组已满,所以选择最近未使用的 \(0\) 组进行write back
, 向 memory 写入 \(4\) 个 word 数据 11. 之后进行allocate
, 读取 \(4\) 个 word 数据 12. 最后出现写hit
, 写入07210721
数据