

## SD/MMC Bootloader Specification

Author: Arnim Läuger arniml@opencores.org

> Rev. 3.2 March 16, 2006



This page has been intentionally left blank.



#### **Revision History**

| Rev. | Date        | Author       | Description                                |
|------|-------------|--------------|--------------------------------------------|
| 1.0  | 27-Feb-2005 | Arnim Läuger | First Version                              |
| 2.0  | 09-Mar-2005 | Arnim Läuger | Update for set concept                     |
| 3.0  | 14-Apr-2005 | Arnim Läuger | Handshaking information                    |
| 3.1  | 16-Apr-2005 | Arnim Läuger | Formatting                                 |
| 3.2  | 16-Mar-2006 | Arnim Läuger | Detailed description of generic parameters |
|      |             |              | Formatting and typos fixed                 |



### Contents

| INTRODUCTION | 1 |
|--------------|---|
| ARCHITECTURE | 2 |
| OPERATION    | 5 |
| INTEGRATION  | 8 |
| IO PORTS     |   |



## 1

## Introduction

The SD/MMC Bootloader is a CPLD design that manages configuration and bootstrapping of FPGAs. It is able to retrieve the required data from SecureDigital (SD) cards or MultiMediaCards (MMC) and manages the FPGA configuration process. SD cards as well as MMCs are operated in SPI mode which is part of both standards thus eliminating the need for dedicated implementations. The SD/MMC Bootloader fits both. Beyond configuration, this core supports a bootstrapping strategy where multiple images are stored on one single memory card.

For example consider a system completely based on SRAM. The bootloader provides an initial configuration data from the first image to the FPGA. This image contains a design which pulls the next image from the memory card and transfers this data to SRAM. In the last step, the final FPGA design is loaded from the third image.

#### Features

- Configuration mode: configures SRAM based FPGAs via slave serial mode (Xilinx and Altera)
- Data mode: provides stored data over a simple synchronous serial interface
- Broad compatability using SPI mode
  - SecureDigital cards using dedicated initialization command
  - MultiMediaCards (see below)
- Operation triggered by power-up or card insertion

The SD/MMC Bootloader project is maintained and released on the OpenCores web server at

http://www.opencores.org/projects.cgi/web/spi\_boot/overview/

Updates of this core can be obtained via the project pages.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPE-CIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCURE-MENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTER-RUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.



## 2 Architecture

The architecture of the SD/MMC Bootloader is depicted in Figure 1. It consists of the controller and command FSMs, an SPI port, two config/data ports and three counters.



Figure 1: SD/MMC Bootloader block diagram

#### **Controller FSM**

The controller FSM manages the overall functionality of the core. On one hand, this includes the complete SD/MMC SPI mode protocol with initialization, data retrieval and abort. On the other hand, configuration and data requests are handled.

#### **Command FSM**

The command FSM sequences each single SPI mode command and generates the corresponding bit stream. Each sequence consists of the command itself, the card's response and optional data. Whenever this sequence has finished, the controller FSM is flagged, triggering it to step to the next command.

#### **Bit Counter**

The bit counter provides a generic counting service to the command FSM. It times each part of a command sequence and generates an overflow indicator to the command FSM.



#### Image Counter

The image counter tracks the number of the current image. It increments according to the instructions of the controller FSM.

#### **MMC Compatability Counter**

For full compatability with the MMC standard it is required to initialize the card with a maximum clock frequency of 400 kHz. The clock division is done with the MMC compatibility counter which signals its overflow to the bit counter. As soon as the initialization phase has finished, the MMC compat counter is disabled by the controller FSM.

#### SPI Port

The SPI port connects to the pins of the SD or MM card according to Table 1.

| Signal       | Connector      | Description                    |
|--------------|----------------|--------------------------------|
| spi_clk      | Pin 5, CLK     | Clock                          |
| spi_cs_n     | Pin 1, CS      | Chip Select (Active low)       |
| spi_data_in  | Pin 7, DataOut | Card to Host Data and Status   |
| spi_data_out | Pin 2, DataIn  | Host to Card Commands and Data |

Table 1: SD/MMC connections of SPI port

Furthermore, this port contains an output enable signal to put all outputs to tri-state.

#### **Configuration Port**

This port interfaces to the configuration facilities of the FPGA. It matches both Altera and Xilinx products. The mapping is given in Table 2.

| Signal     | Altera    | Xilinx   | Description                           |
|------------|-----------|----------|---------------------------------------|
| config_n   | nCONFIG   | PROGRAM# | Initiates FPGA configuration sequence |
| cgf_init_n | nSTATUS   | INIT#    | Rising edge indicates end of init     |
| cfg_done   | CONF_DONE | DONE     | Loading the configuration completed   |
| cfg_clk    | DCLK      | CCLK     | Configuration clock                   |
| cfg_dat    | DATA0     | DIN      | Configuration data                    |

**Table 2: Configuration port mapping** 

#### Data Port

The data port accepts control signals that control the sequence when reading multiple images and sets from the card. Table 3 describes their meaning.

| Signal   | Description                                                                              |
|----------|------------------------------------------------------------------------------------------|
| start    | Initiates configuration sequence when asserted low                                       |
| mode     | Mode selector: $0 \rightarrow \text{configuration mode}, 1 \rightarrow \text{data mode}$ |
| dat_done | Loading in data mode completed                                                           |
| detached | Indication that the SPI port outputs are tri-stated                                      |
| set_sel  | Set selection                                                                            |

#### Table 3: Data port signals

#### Memory Organization

Data on the card is paritioned into sets, with each set consisting of one or more images. Sets are a static containers while images form the dynamic part of this two-fold scheme. The core automatically increments its image counter whenever an image is requested via



the start trigger. Hence the dynamic attribute of images. The address offset for the images involved is calculated by the set selection inputs. They are applied externally and are static during a bootstrapping sequence.

Figure 2 shows the generic memory organization.



#### Figure 2: Memory organization

Parameters in Figure 2 are defined as follows (see also Table 4)

- bpi, bits per image: Number of bits required for the image address range.
   bpi = 18 → image contains 2<sup>18</sup> = 256 KB
- i: Number of images within a set.
- m: Width of set counter. m =  $2 \rightarrow 2^2 = 4$  images
- s: Number of sets on card.
- p: Width of set selection vector.  $p = 4 \Rightarrow 2^4 = 16$  sets



# **Operation**

The SD/MMC Bootloader has three operation states coupled to the interaction with the memory card. After reset, the core is in initialization state and automatically configures the SD or MM card. Next is the idle state where the core deactivates its outputs on the SPI interface. Upon an external request, the core switches to the transfer state and requests data from the memory card. The transfer state itself has two modes for either configuring an FPGA or simply passing through the card's data.

#### **Initialization State**

The flow diagram of the initialization state is shown in Figure 3.



**Figure 3: Initialization flow diagram** 



After reset, the core remains in the power up state spending a total of 144 clock cycles before the SPI interface is activated. This is more than twice the time the card needs to initialize its internal states (specified to 64 clock cycles). The extra time is to eliminate uncertainties both in power ramp up and card start up. If the application is time critical and there is an external power supply monitor it is safe to reduce the power up time to something around 74 clocks. This modification has to be applied to the VHDL source code of the core.

The first command issued by the core is GO\_IDLE\_STATE (CMD0) with parallel assertion of CS. This resets the card and puts it in SPI mode. Them the core sends the command APP\_CMD (CMD55) to escape the next extended command. MultiMediaCards will respond to this with an illegal command error. The core detects this and uses CMD1 in the further process. In both ways (CMD55 + ACMD41 and CMD1) the idle status of the card is polled repeatedly. Once it left idle state, the core sets the desired block length with SET\_BLOCKLEN (CMD16). The block length is derived from the generic parameter width\_bit\_cnt (refer to Table 4).

The core is now idle and SPI output signals are tri-stated. Output pin detached is '1'.

#### **Transfer State**

Whenever a start trigger is detected, the core turns to transfer state. The trigger consists of a low-to-high transition of the start input. To allow automatic operation, the core also treats a constant high level at start as a trigger after reset. I.e. when the core reaches the idle state for the first time, it continues immediately when start is high. For subsequent loops through transfer and idle state start has to go low and high again.

When going from idle to transfer mode, the core samples the mode input which determines whether the transfer should be done in configuration or data mode.



Figure 4: Transfer flow diagram



Configuration and data mode are the same except that in configuration mode a configuration cycle for the FPGA is generated. This cycle is similar for Altera and Xilinx devices (also refer to Table 2):

- 1. Activation of config\_n  $\rightarrow$  configuration memory is cleared
- 2. Wait for low level on cfg\_init\_n  $\rightarrow$  FPGA acknowledges assertion of config\_n
- 3. Wait for high level on cfg\_init\_n  $\rightarrow$  memory cleared, ready for configuration data

This sequence is skipped for data mode. The core continues with the data transfer itself in both modes.

First step is to request a block of data by sending READ\_MULTIPLE\_BLOCKS command (CMD18). The address of the first block depends on the selected set and the current value of the image counter. Starting from 0, it counts the number of images transferred and the start address of an image is derived from the following formula

 $start\_address=2^{num\_bits\_per\_image}*(cnt_{img}+set*2^{width\_img\_cnt})$ 

As soon as the card has retrieved the specified block, the bit stream is presented at the cfg\_clk and cfg\_dat outputs. Data at cfg\_dat can be sampled with the rising edge of cfg\_clk. The card now sends one block after another without intervention of the core. Configuration clock and data outputs are only operated when there is a valid bit stream from the card. I.e. during gaps between two blocks cfg\_clk remains on high level. It changes from high to low for the next valid bit on cfg\_dat.

This sequence is terminated when both cfg\_done and dat\_done are activated. The configuration clock is stopped immediately and the core sends STOP\_TRANSMISSION (CMD12) to the card. It may take some time before the core is finally in idle state again depending on the block size and the time the done signals have been activated.



## 4 Integration

This chapter provides informations on the integration of the SD/MMC Bootloader in an FPGA system.

#### **Configuration Timing**

As written above, the configuration clock is stopped immediately when the core samples a high level on both cfg\_done and dat\_done. The intention is to prevent any unwanted data bits to be transferred to the FPGA. This scheme requires on the other hand that the FPGA has terminated its configuration process at this time. For Spartan IIe devices (and probably others) it is therefore necessary to program the DONE pin to the last cycle of the startup sequence (one clock cycle earlier is probably also ok). With the default settings of Xilinx WebPack the FPGA will not be able to finish its startup sequence because DONE is asserted too early.

There is no experience so far with Altera devices.

#### Writing Data to the Card

Downloading the configuration data to the card is a straight forward process. The images have to be written starting at dedicated locations. For the provided toplevel designs, these locations are multiples of 256 K. I.e. 0, 0x40000, 0x80000 and so forth.

dd (part of the GNU coreutils) serves this purpose:

```
$ dd if=ram_loader.bin of=/dev/sdX bs=512
```

\$ dd if=pongrom\_6.bin of=/dev/sdX bs=512 seek=512

```
$ dd if=pacman.bin of=/dev/sdX bs=512 seek=1024
```

The name of the device node depends on how the card reader is attached to the kernel. For Linux systems this is most often something like /dev/sdX with X ranging from a-z. Please note that it is essential to use the device without any trailing numbers as these refer to partitions leading to wrong offsets for data written to the card.

All this works perfectly for my Spartan IIe device as this FPGA expects the configuration data as it is delivered from the card: Consecutive bytes each with its most significant bit first. Altera devices like the FLEX family are different here. They expect the bytes with least significant bit first. Therefore, the configuration data has to be swapped bitwise before it is written to the card.



#### Schematic

A sample schematic for embedding SD/MMC Bootloader in an FPGA system is provided in spi\_boot\_schematic.pdf. I use it to configure/boot the Xilinx Spartan IIe on BurchED's B5-X300 board. SV2 fits the "SERIAL MODE" connector on this board but you will have to add a separate wire from R6 to attach INIT. Please check the proper use of the pull-up resistors for your specific board.

Only the configuration port has to be connected to the FPGA even in case the data mode is not required and only one single configuration sequence should be applied. The core will automatically start configuration as start and mode inputs are pulled high. When configuration has finished (FPGA sets cfg\_done), the core will remain in the idle state because there is no further low-to-high transition at start.

#### Interfacing

The config port is connected to the respective FPGA as described in Table 2. The port is intended for configuration with slave serial mode where the config clock is supplied by the SD/MMC Bootloader core.

The interface of the data port consists of the start and mode inputs that are used to trigger the transfer and to select the mode. Subsequent retrieval of images requires a low-to-high transition of start. When start is brought to '1', mode has to be set to the required transfer mode. The core will when retrieve the image from the card and will apply valid data at cfg\_dat with the falling edge of cfg\_clk (in both configuration and data mode). Termination of the data transfer is indicated by dat\_done set to '1'. The core will then finish the access to the card and will return to idle state where it waits for a new start trigger. In idle state, the core tri-states all outputs of the SPI port and indicates this by setting the output detached to '1'.

A design that requests one or more images can monitor detached to retrieve information of the core's state. This output serves also as a handshake indicator for the start trigger. The start input is sampled while detached is at '1'. When detached has switched to '0', the FPGA design can safely pull start back to '0' to prepare the next transfer which is typically initiated after asserting dat\_done.

#### Generic Parameters

The spi\_boot core can be tailored to the requirements of the user by several parameters. This section discusses the toplevel generics listed in Table 4 in detail.

- width\_set\_sel\_g Defines the width of the set selection input vector. Ajust this to the number of sets you intend to use.
- width\_bit\_cnt\_g

Defines the width of the bit counter. To handle the full SPI protocol, the minimum width is 6 bits. This results in the smallest possible resource requirement for the counter while imposing the largest decrease on read performance.

In order to share CPLD resources, the bit counter is also used to time the length of a read block. The minimum of 6 selects a block length of 8 bytes. A counter width of 12

<sup>1</sup> Both dat\_done and cfg\_done have to be '1' to actually terminate the transfer.



allows to minimize the protocol overhead by increasing the block length to its maximum of 512 bytes in SPI mode.

width\_img\_cnt\_g

Defines the width of the image counter and thus the number of images. 0 selects one image per set and results in lower resource usage when multi set operation is not required.

num\_bits\_per\_image\_g

Defines the number of bits to address one single image. 2<sup>num\_bits\_per\_image\_g</sup> is the size of one image in bytes.

sd\_init\_g

SD cards require CMD55/ACMD41 instead of CMD1 for initialization. Setting this parameter to one instructs the core to also probe with CMD55/ACMD41. When using only MMC, sd\_init\_g can be set to 0 to decrease resource usage.

mmc\_compat\_clk\_div\_g

Defines rollover value of the MMC compatibility clock divider. Old MM cards and some SD cards require that the clock frequency during initialization phase is not higher than 400 kHz. Select a value which satisfies

 $400 \text{ kHz} \ge f(\text{clk}_i) / (\text{mmc_compat_clk}_div_g * 2)$ 

For cards that accept initialization with a faster clock frequency, this parameter can be set to 0 to decrease resources usage.

width\_mmc\_clk\_div\_g

Defines the width of the MMC compatability clock divider. Adjust to  $2^{width\_mmc\_clk\_div\_g} \ge mmc\_compat\_clk\_div\_g.$ 

#### Compatability

These cards have been tested with the SD/MMC Bootloader:

- Cannon 8 MB SD
- Hama 64 MB SD
- Panasonic 32 MB SD
- SanDisk 128 MB SD
- SanDisk 64 MB MMC
- SanDisk 64 MB SD

Some MMCs might fail with this core as not all cards support CMD18 (READ\_MULTI-PLE\_BLOCK). Please consult the data sheet of your specific model. In case your MMC does not implement CMD18 you might want to have a look at the FPGA MMC-Card Config project at <u>http://www.opencores.org/projects.cgi/web/mmcfpgaconfig/overview/</u>



## 5 IO Ports

| Port           | Width  | Dir                                                | Description                                     |  |  |
|----------------|--------|----------------------------------------------------|-------------------------------------------------|--|--|
| clk_i          | 1      | In                                                 | Clock input                                     |  |  |
| reset_i 1      |        | In                                                 | Reset input – active level selected via generic |  |  |
| set_sel_i 1-s  |        | In                                                 | Set selection input                             |  |  |
| spi_clk_o 1    |        | Out                                                | SPI clock output                                |  |  |
| spi_cs_n_o     | 1      | Out                                                | SPI chip select – low active                    |  |  |
| spi_data_in_i  | 1      | In                                                 | SPI data from card                              |  |  |
| spi_data_out_o | 1      | Out                                                | SPI data to card                                |  |  |
| spi_en_outs_o  | 1      | Out                                                | Tristate driver enable for SPI outputs          |  |  |
| start_i        | 1      | In                                                 | Start trigger                                   |  |  |
| mode_i         | 1      | In                                                 | Mode selection                                  |  |  |
| config_n_o     | 1      | Out                                                | Begin configuration                             |  |  |
| cfg_init_n_i   | 1      | In                                                 | Configuration init handshake – low active       |  |  |
| cfg_done_i     | 1      | In                                                 | Configuration done                              |  |  |
| dat_done_i     | 1      | In                                                 | Data transfer done                              |  |  |
| datached_o     | 1      | Out                                                | SPI port outputs tri-stated                     |  |  |
| cfg_clk_o      | 1      | Out                                                | Configuration clock output                      |  |  |
| cfg_dat_o      | 1      | Out                                                | Configuration data                              |  |  |
| Generic        | Value  | Description                                        |                                                 |  |  |
| width_set_sel  | 1 – p  | Width of set selection, $2^{p} = s$ sets available |                                                 |  |  |
| width_bit_cnt  | 6 – 12 | Width of bit counter                               |                                                 |  |  |
| width_img_cnt  | 0 – m  | Width                                              | of image counter, $2^{m} = i$ images available  |  |  |
| num_bits_per_  | 0-31   | Number of bits required to address one image       |                                                 |  |  |
| img            |        |                                                    |                                                 |  |  |
| sd init 0, 1   |        | SD specific initialization command                 |                                                 |  |  |
|                |        | 1 : use ACMD41                                     |                                                 |  |  |
|                |        | 0 : do not use ACMD41                              |                                                 |  |  |
| mmc_compat_    | 0 – n  | Maximum count for MMC compatibility counter        |                                                 |  |  |
| clk_div        |        | 0 : do not implement MMC compatibility counter     |                                                 |  |  |
| width_mmc_cl   | 0 – o  | Width                                              | vidth of MMC compatibility counter              |  |  |
| k_div          |        |                                                    |                                                 |  |  |
| reset_level    | 0, 1   | Active                                             | ve level of reset_i                             |  |  |

The following Table 4 shows the primary IO ports of spi\_boot.

 Table 4: List of IO ports