{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# AI Engine With PL Example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The AI engine with PL example demonstrates how to use AI engine for computation,\n",
    "and use PL for data movement. In this example, to run the matrix multiplication \n",
    "on AI engine, we use standard matrix multiplication algorithm.The user can change \n",
    "the matrix size and the number of cores utilized at compile-time. The expected \n",
    "matrix size must be a multiple of 50 (number of cores used) with the minimum and \n",
    "maximum value as 100x100 and 500x500 respectively.\n",
    "\n",
    "Please note that this example is intended to be a proof of concept only. There\n",
    "can be other ways of implementation, which can leverage more of the AIE resource\n",
    "and hence can result in better performance figures.\n",
    "\n",
    "## Introduction\n",
    "\n",
    "Consider two matrices A and B, the product of the two, i.e. AxB, is a linear\n",
    "combination of the the columns of A by matrix B. This means that the elements\n",
    "in a row (i) of A are multiplied with the elements in a column of B (j) and are\n",
    "summed up to give the corresponding single element in the matrix AxB at i, j.\n",
    "This means that if A is an n x m matrix and B is an m x p matrix, then the\n",
    "corresponding product AxB would have dimensions n x p. Note how the number of\n",
    "columns of A equals the number of rows in B to make the matrix multiplication\n",
    "possible.\n",
    "\n",
    "## Implementationi Details\n",
    "\n",
    "\n",
    "Data slicing and data movement\n",
    "----------------------------------------------\n",
    "\n",
    "To compute matrix multiplication on AIE, matrix A is sliced horizontally and\n",
    "distributed equally among all the core utilized through the AXI-Stream network.\n",
    "Matrix B is transposed and feed to the first core in the design element by\n",
    "element. The first core shares the input matrix B with the next core through the\n",
    "AXI-Stream connection. As the output received from the cores is in column-major\n",
    "fashion, hence a transpose of the output matrix is expected.\n",
    "\n",
    "<img src=\"pics/data_movement.png\">\n",
    "\n",
    "The application uses the PLIO attribute to make external memory-mapped\n",
    "connections to or from global memory. These connections can be created between\n",
    "AIE kernel and the logical global memory port of the hardware platform design\n",
    "via AXI-Multichannel Direct Access IP in the fabric. In this design, the buffer\n",
    "descriptors are programmed in the AXI-MCDMA IP to initiate AIE to DDR read\n",
    "and write transactions in the PS program. The burst-length of the memory-mapped\n",
    "transaction is 64-bit, with required bandwidth as 1000 MB/s and the memory\n",
    "addressing as physical.\n",
    "\n",
    "\n",
    "<img src=\"pics/aie_app_data_movement.png\">\n",
    "\n",
    "\n",
    "### Compilation Flow\n",
    "\n",
    "<img src=\"pics/compilation_flow.png\">\n",
    "\n",
    "There are 2 sets of external interfaces for AI Engine configuration\n",
    "- AI Engine configuration\n",
    "\tDirect call to AI Engine driver APIs\n",
    "\tCDO parsing APIs\n",
    "- ELF loading\n",
    "\tRemoteProc APIs\n",
    "\n",
    "The high-level tool, Vitis, can generate outputs in those 2 formats. Also, the\n",
    "user can manually implement the application using direct calls and compile the\n",
    "ELF using the low-level compiler.\n",
    "\n",
    "The generated aie_control.cpp is cross-compiled to run on the target. The\n",
    "compile application loads the generated ELF to the corresponding tile through AI\n",
    "Engine remoteproc instance. The AI Engine configuration is done by calling AI\n",
    "Engine driver APIs directly or pass CDO object through CDO parser library. The\n",
    "CDO parser is an external component, and the AI Engine software uses the CDO\n",
    "APIs.\n",
    "\n",
    "\n",
    "\n",
    "### Run-time Execution\n",
    "\n",
    "At run-time, Linux application binary calls (UIO based) AI Engine userspace\n",
    "driver and (tool generated) run time library, libcardano_api.so. AIE userspace\n",
    "drivers abstract the libmetal and remoteproc APIs to handle runtime\n",
    "configuration along with ELF loading.\n",
    "\n",
    "\n",
    "<img src=\"pics/runtime_execution.png\">\n",
    "\n",
    "The libmetal provides the IO abstraction to the application, which allows the\n",
    "application to be platform independent, ex standalone and Linux. So all the IO\n",
    "in the driver is routed through libmetal, and libmetal can handle platform\n",
    "specific part.\n",
    "\n",
    "The Linux UIO subsystem allows to run IO from Linux userspace with a small\n",
    "kernel module, including the MMIO and interrupt handling. The UIO interfaces are\n",
    "based on the Linux sysfs, and the AI Engine driver stack utilizes this UIO\n",
    "subsystem through libmetal, to enable platform-independent AI Engine software\n",
    "stack. So the major part of the AIE specific code resides in the userspace as a\n",
    "library, which can be reused between other software platforms such as baremetal.\n",
    "\n",
    "### Build Application Using PetaLinux Tools\n",
    "\n",
    "By default, the AIE matrix multiplication application is enabled. To enable,\n",
    "do following:\n",
    "\n",
    "petalinux-config -c rootfs\n",
    "```\n",
    "    [*] user packages --->\n",
    "        [*] aie-matrix-multiplication\n",
    "```\n",
    "To rebuild the project run,\n",
    "\n",
    "petalinux-build.\n",
    "\n",
    "The generated FIT image will be in images/linux/image.ub.\n",
    "\n",
    "Follow PetaLinux boot process to launch the Linux on the target.\n",
    "After you see the Linux login prompt, you can log in with user \"root\" and\n",
    "password \"root\".\n",
    "\n",
    "The AIE ELFs are installed in the `/lib/firmware` directory.\n",
    "We will need to go to `/lib/firmware` to run the application.\n",
    "\n",
    "### Sample Output\n",
    "```\n",
    "root@xilinx-vc-p-a2197-00-reva-x-prc-01-reva-2019_2_siea:~# cd /lib/firmware/aie\n",
    "root@xilinx-vc-p-a2197-00-reva-x-prc-01-reva-2019_2_siea:~# aie-matrix-multiplication\n",
    "PLIO MCDMA> reset_dma\n",
    "metal: info:      Registered shmem provider linux_shm.\n",
    "metal: info:      Registered shmem provider ion.reserved.\n",
    "metal: info:      Registered shmem provider ion.ion_system_contig_heap.\n",
    "metal: info:      Registered shmem provider ion.ion_system_heap.\n",
    "metal: info:      device xilinx-aiengine in use by driver uio_dmem_genirq\n",
    "metal: error:     device platform:f70a0000.aie-npi not found\n",
    "metal: error:     device platform:f70a0000.aie-npi not found\n",
    "PLIO MCDMA> allocated ddr mem: 0x7f95e28000.\n",
    "PLIO MCDMA> pa = 0x70100000.\n",
    "PLIO MCDMA> Successful\n",
    "```\n",
    "### Customizing and Rebuilding\n",
    "Following are prerequisites if the user wants to make any changes in\n",
    "software,\n",
    "  * Cardano and cross-compile binaries for any software-related changes.\n",
    "  * PetaLinux Tools.\n",
    "\n",
    "As mentioned earlier, the user can change the number of cores utilized by matrix\n",
    "multiplication. However, since the data memory immediately available to the core\n",
    "is limited, reducing the number of cores utilized reduces the maximum matrix\n",
    "size supported by the application. Within the config.h header file NUM_CORES\n",
    "macro can be altered to change the number of cores utilized.  The maximum cores\n",
    "available are 50. With the changes made in the application, care must be taken\n",
    "by the user that the newly generated configuration is supported by the\n",
    "underlying hardware design.\n",
    "\n",
    "To rebuild, go to the meta-user demo recipe files directory:\n",
    "project-spec/meta-user-recipes-apps/aie-matrix-multiplication/files.\n",
    "set the CARDANO_ROOT and then call AI engine compiler to build:\n",
    "```\n",
    "  export CARDANO_ROOT=<Root_To_Installed_Cardano_which_is_under_Vitis>\n",
    "  source $CARDANO_ROOT/scripts/cardano_env.sh\n",
    "  aiecompiler --constraints graph_aie_constraints.aiecst \\\n",
    "              --dataflow src/xgemm.cpp --test-iterations=4096 --verbose \\\n",
    "              -full-program=true --use-phy-shim -use-real-noc=true \\\n",
    "              -Xmapper=-c0 -device VC1902 -phydevice xcvc1902 \\\n",
    "              -stacksize=4608 -heapsize=27136 --enable-async-window\n",
    "```\n",
    "After the AIE application is customized and compiled, user can run\n",
    "`clean-aie-work.sh` to clean the Work directory to remove unncessary files,\n",
    "only leave the AIE kernels and the aie_control.cpp file.\n",
    "\n",
    "The Linux AIE graph .cpp file is in the `src/` directory. After you build the\n",
    "AIE application, if you need to build the Linux control application.\n",
    "\n",
    "You can use the `Makefile.Linux` in the `aie-matrix-multiplication/files/src`\n",
    "directory to build the Linux AIE graph control application.\n",
    "You will need to specify CARDANO_ROOT and the Linux sysroot.\n",
    "\n",
    "To get the Linux sysroot, you can do this with PetaLinux Tools. Go to the PetaLinux\n",
    "project directory, run the following command:\n",
    "```\n",
    "  $ petalinux-build --sdk\n",
    "  $ petalinux-build --sysroot\n",
    "```\n",
    "The sysroot will be generated in `images/linux/sdk/sysroots/aarch64-xilinx-linux/`\n",
    "Use Makefile.Linux to rebuild the Linux AIE graph application:\n",
    "```\n",
    "  $ make -f Makefile.Linux \\\n",
    "    SYSROOT=<plnx_proj>/images/linux/sdk/sysroots/aarch64-xilinx-linux/ \\\n",
    "    CARDANO_ROOT=<cardano_root>\n",
    "```\n",
    "\n",
    "The generated Linux binary will be `aie-matrix-multiplication`.\n",
    "\n",
    "The user can then rebuild PetaLinux.\n",
    "\n",
    "NOTE:\n",
    " * No hardware change is supported in this version of Vivado for this design.\n",
    "\n",
    "### PDI Generation\n",
    "Vivado can generate a boot PDI includesd PS, PL and NoC configuration with\n",
    "\"Generate bitstream\" icon from Vivado GUI, but it will not includes the AIE\n",
    "configuration and AIE ELFs.\n",
    "\n",
    "In order to includes the AIE configuration and ELFs into the boot PDI, user\n",
    "will need to update the BIF of the boot PDI generated by Vivado.\n",
    "Vivado generate the boot PDI and the BIF in the hardware project's\n",
    "\"*.runs/impl_1/\" directory, e.g.:\n",
    " * ps0_me_dma_wrapper.pdi\n",
    " * ps0_me_dma_wrapper.pdi.bif\n",
    "\n",
    "User will need to generate the AIE configuration CDO with cardano first,\n",
    "and update the bif to includes the AIE CDO and ELFs.\n",
    "\n",
    "To generate the AIE CDO, user can go to the cardano generated AIE Work/\n",
    "directory which is generated while compiling the AIE application:\n",
    "```\n",
    "  cd Work/ps/cdo/\n",
    "```\n",
    "Source cardano settings,\n",
    "```\n",
    "  export CARDANO_ROOT=<Installed_Vitis>/cardano/\n",
    "  source $CARDANO_ROOT/scripts/cardano_env.sh\n",
    "```\n",
    "Run the ./generateAIEConfig from this directory. It will generate aie_cdo.bin\n",
    "file.\n",
    "And then please add the following configuration partitions to the boot PDI\n",
    "BIF file generated by Vivado to includes the AIE configuration and ELFs:\n",
    "```\n",
    " partition\n",
    "  {\n",
    "   id = 0x10\n",
    "   type = cdo\n",
    "   file = <AIE_APP>/Work/ps/cdo/aie_cdo.bin\n",
    "  }\n",
    "  partition\n",
    "  {\n",
    "   id = 0x12\n",
    "   core = aie\n",
    "   file = <AIE_APP>/Work/aie/0_0/Release/0_0\n",
    "  }\n",
    "  ...\n",
    "  partition\n",
    "  {\n",
    "   id = 0x12\n",
    "   core = aie\n",
    "   file = <AIE_APP>/Work/aie/<C>_<R>/Release/<C>_<R>\n",
    "  }\n",
    "\n",
    "```\n",
    "\n",
    "And then use bootgen to generate the PDI, you can source vitis settings to use\n",
    "bootgen. E.g.\n",
    "```\n",
    "  bootgen -arch versal -image <BIF> -o <PDI>\n",
    "```\n",
    "\n",
    "The \"files/aie-pdi-gen.sh\" gives an example on how to generate a boot PDI to\n",
    "include AIE configuration and ELFs or a partial PDI which only contains AIE\n",
    "configuration and ELFs.\n",
    "The `files/new-aie.bif` gives an example of a boot PDI BIF to contains AIE.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run Demo Example\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "cd /lib/firmware/aie"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
     {
       "name": "stdout",
       "output_type": "stream",
       "text": []
     }
   ],
   "source": [
    "!aie-matrix-multiplication"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    " [1] Vivado User Guide - for hardware related design.\n",
    "\n",
    " [2] Vitis User Guide - for AIE application.\n",
    "\n",
    " [3] Versal TRM - General subsystem overview.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
