-
Notifications
You must be signed in to change notification settings - Fork 24
/
spec.tex
2205 lines (1951 loc) · 111 KB
/
spec.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass{gqtekspec}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%
%% Filename: spec.tex
%%
%% Project: OpenArty, an entirely open SoC based upon the Arty platform
%%
%% Purpose: This file describes how to build the specification for the
%% OpenArty project. It's not nearly as interesting as the
%% .pdf file it creates, although you are welcome to browse through here
%% should you wish.
%%
%% Running "make" in the doc/ project directory (one up from where this
%% file sites) should build this into a PDF file.
%%
%% For those who are unable to build the PDF file, a copy of it is kept
%% in the distribution.
%%
%% Creator: Dan Gisselquist, Ph.D.
%% Gisselquist Technology, LLC
%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%
%% Copyright (C) 2015-2016, Gisselquist Technology, LLC
%%
%% This program is free software (firmware): you can redistribute it and/or
%% modify it under the terms of the GNU General Public License as published
%% by the Free Software Foundation, either version 3 of the License, or (at
%% your option) any later version.
%%
%% This program is distributed in the hope that it will be useful, but WITHOUT
%% ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
%% FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
%% for more details.
%%
%% You should have received a copy of the GNU General Public License along
%% with this program. (It's in the $(ROOT)/doc directory, run make with no
%% target there if the PDF file isn't present.) If not, see
%% <http://www.gnu.org/licenses/> for a copy.
%%
%% License: GPL, v3, as defined and found on www.gnu.org,
%% http://www.gnu.org/licenses/gpl.html
%%
%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%
%%
\usepackage{import}
\usepackage{bytefield}
\usepackage{listings}
\project{OpenArty}
\title{Specification}
\author{Dan Gisselquist, Ph.D.}
\email{dgisselq (at) ieee.org}
\revision{Rev.~0.0}
\begin{document}
\pagestyle{gqtekspecplain}
\titlepage
\begin{license}
Copyright (C) \theyear\today, Gisselquist Technology, LLC
This project is free software (firmware): you can redistribute it and/or
modify it under the terms of the GNU General Public License as published
by the Free Software Foundation, either version 3 of the License, or (at
your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
You should have received a copy of the GNU General Public License along
with this program. If not, see \texttt{http://www.gnu.org/licenses/} for a copy.
\end{license}
\begin{revisionhistory}
0.0 & 6/20/2016 & Gisselquist & First Draft \\\hline
0.0 & 10/21/2016 & Gisselquist & More Comments Added\\\hline
0.0 & 11/18/2016 & Gisselquist & Added a getting started section\\\hline
0.0 & 04/19/2017 & Gisselquist & Added memory config setup images\\\hline
\end{revisionhistory}
% Revision History
% Table of Contents, named Contents
\tableofcontents
\listoffigures
\listoftables
\begin{preface}
\end{preface}
\chapter{Introduction}\label{ch:intro}
\pagenumbering{arabic}
\setcounter{page}{1}
At {\$ 99}, the Arty is a very economical FPGA platform for doing
a lot of things. It was designed to support the MicroBlaze soft CPU platform,
and as a result it has a lot more memory plus ethernet support. Put together,
it feels like it was designed for soft--core CPU development. Indeed, it has
an amazing capability for its price.
Instructions and examples for using the Arty, however, tend to focus on
schematic design development techniques. While these may seem like an
appropriate way to introduce a beginner to hardware design, these techniques
introduce a whole host of problems.
The first and perhaps biggest problem is that it can be difficult to trouble
shoot what is going on. This is a combination of two factors. The first is
that many of the reference schematic designs make use of proprietary IP. In
an effort to protect both their IP and themselves, companies providing such
IP resources often make them opaque, and difficult to see the internals of.
As a result, it can be difficult to understand why that IP isn't working in
your design. Further, while many simulation tools exist, only the Xilinx tools
will allow full simulation of Xilinx proprietary IP. Finally, while it may be
simple to select a part and ``wire'' it up within a schematic, most IP
components have many, many configuration options which are then hidden from the
user within the simplified component. These options may be the difference
between successfully using the component and an exercise in frustration.
Put together, all of these features of schematic design make the design more
difficult to troubleshoot, and often even impossible to troubleshoot using
open source tools such as Verilator.
Another problem is that schematic based designs often hide their FPGA resource
usage. They can easily become resource hogs, leaving the designer unaware
of the consequences of what he/she is implementing. As an example, the memory
interface generated by Xilinx's Memory Interface Generator (MIG) consumes
nearly a full quarter of the Arty's FPGA resources, while delaying responses
to requests by upwards of 250~ns. Further, while Xilinx touts its MicroBlaze
processor as only using 800--2500 LUTs, the MicroBlaze architecture requires
it be connected to four separate AXI busses, with each of those having
five channels, all with their requests and acknowledgement flags. These
can therefore easily consume all of the resources within an architecture, before
providing any of the benefit the designer was looking for when they chose to
use an FPGA.
% What is old
% Arty, XuLA, Learnables using schematic drawing techniques
% What does the old lack?
% Arty lacks open interfaces, instead using MIG and CoreGen w/ AXI bus
% What is new
% OpenArty has its own memory interface controller, and runs everything
% off of an open Wishbone bus structure.
% What does the new have that the old lacks
%
% What performance gain can be expected?
%
Here in this project, we present another alternative.
First, the OpenArty is entirely built in Verilog, and (with the exception of
the MIG controller), it is entirely built out of OpenSource IP.\footnote{I'm
still hoping to place an open memory controller into this design. This
controller is written in logic, but does not yet connect to any hardware ports.}
Second, configuration options, such as cache sizes, can be fine tuned via a
CPU options file.
Third, as you will find from examining the RTL sources, this project uses only
one bus, and that bus has ony one channel associated with it: a Wishbone Bus.
This helps to limit the logic associated with trying to read and write from
the CPU, although it may increase problems with fanout.
Finally, because the OpenArty project is made from open source components, the
entire design, together with several of its peripherals, can be simulated using
Verilator. This makes it possible to run programs on the ZipCPU within the
OpenArty design, and find and examine where such programs (or their peripherals)
fail.
Overall, the goals of this OpenArty project include:
\begin{enumerate}
\item Use entirely open interfaces
This means not using the Memory Interface Generator (MIG), the
Xilinx CoreGen IP, etc.
(This goal has not yet been achieved.)
\item Use all of Arty's on--board hardware: Flash, DDR3-SDRAM, Ethernet, and
everything else at their full and fastest speed(s). For example, the
flash will need to be clocked at 82~MHz, not the 50~MHz I've clocked
it at in previous projects. The DDR3 SDRAM memory should also be able
to support pipelined 32--bit interactions over the Wishbone bus at a
162~MHz clock. Finally, the Ethernet controller should be supported
by a DMA capable interface that can drive the ethernet at its full
100Mbps rate.
(Of these, only the ethernet goal has been met.)
\item Run using a 162.5~MHz system clock, if for no other reason than to gain
the experience of building logic that can run that fast.\footnote{The
original goal was to run at 200~MHz. However, the memory controller
cannot run faster than about 82~MHz. If we run it at 81.25~MHz and
double that clock to get our logic clock, that now places us at
162.5~MHz. 200~MHz is \ldots too fast for DDR3 transfers using the
Artix--7 chip on the Arty.}
While the wishbone bus has been upgraded so that it may run at
200~MHz, the CPU and memory controller cannot handle this speed (yet).
\item Modify the ZipCPU to support an MMU and a data cache, and perhaps even
a floating point unit.
(These are still in development.)
\item The default configuration will also include four Pmods: a USBUART,
a GPS, an SDCard, and an OLEDrgb.
(These have all been tested, and are known to work.)
\end{enumerate}
I intend to demonstrate this project with a couple programs:
\begin{enumerate}
\item An NTP Server
While the GPS tracking circuit is in place, and while it appears to be
able to track a GPS signal to within about 100ns or so, the
network stack has yet to be built.
\item A ZipOS that can actually load and run programs from the SD Card, rather
than just a static memory image stored in flash on start-up.
This will require a functioning memory management unit (MMU), which
will be a new addition to the ZipCPU created to support this project.
For those not familiar with MMU's, an MMU translates memory addresses
from a virtual address space to a physical address space. This allows
every program running on the ZipCPU to believe that they own the entire
memory address space, while allowing the operating system to allocate
actual physical memory addresses as necessary to support whatever
program needs more (or less) memory.
At this point, the MMU has been written and has passed its bench
testing phase. It has not (yet) been integrated with the CPU.
\end{enumerate}
\chapter{Getting Started}\label{ch:getting-started}
\section{Building the Core}
This section includes instructions for how to build the .bit file with Vivado.
Ideally, these instructions would read, ``include all the files in the RTL
directory, pick the file toplevel.v, and implement it. Then build the
bitstream.'' That's how simple the instructions should be. Sadly, because of
the Memory Interface Generator (MIG), and the fact that Xilinx has not published
the details necessary to build a proper DDR3 SDRAM controller using their
hardware, you will need to build a MIG core for the Arty. This is sadly not
trivial.
Let's walk through the process.
The first steps are the obvious ones. First, create a project. I personally
like to create my project so that all of the Xilinx files get placed into a
{\tt xilinx/} project subdirectory, such as in Fig.~\ref{fig:proj_dir}.
\begin{figure}\begin{center}
\includegraphics[width=2in]{../gfx/proj_dir.eps}
\caption{Create a project directory}\label{fig:proj_dir}
\end{center}\end{figure}
but that's really up to you. The next step is to select the part. You can
find the part on the Arty schematic, yielding the choices shown in Fig.~\ref{fig:pick-a-part}.
\begin{figure}\begin{center}
\includegraphics[width=4in]{../gfx/pick-a-part.eps}
\caption{Select the correct FPGA}\label{fig:pick-a-part}
\end{center}\end{figure}
As part of
creating the project, you will want to add all of the {\tt *.v} files from the
{\tt rtl/} and {\tt rtl/cpu/} subdirectories to your project. You'll also want
to add the constraints file, {\tt arty.xdc} found in the main project directory
to your project.
That's the easy part.
The harder part is buildig the MIG core, which you will need to build the rest
of the project. I'll use a series of screen shots to describe this process.
We'll start with opening the IP Catalog from the tools menu, as in
Fig.~\ref{fig:ipcat-menu}.
\begin{figure}\begin{center}
\includegraphics[width=2in]{../gfx/ipcat-menu.eps}
\caption{Open the IP catalog from the Vivado Window Menu}\label{fig:ipcat-menu}
\end{center}\end{figure}
That menu item will bring you to a list of ``Cores'' within the ``IP Catalog''.
From that list of cores, find the ``Memory Interface Generator'' item and double
click on it, as in Fig.~\ref{fig:openmig}.
\begin{figure}\begin{center}
\includegraphics[width=4.5in]{../gfx/openmig.eps}
\caption{Open the Memory Interface Generaor from the IP catalog}\label{fig:openmig}
\end{center}\end{figure}
If everything is set up properly, you should just be able to click next and
move to the first MIG configuration screen. That screen should look something
like Fig.\ref{fig:mig_create}.
\begin{figure}\begin{center}
\includegraphics[width=5in]{../gfx/mig_01-create.eps}
\caption{MIG \#1}\label{fig:mig_create}
\end{center}\end{figure}
Some of the important parts on this screenare the name of the memory controller,
{\tt mig\_axis}, which needs to match the {\tt migsdram.v} file. The other
noteworthy part on this screen is that we will be using the {\tt AXI interface}.
%
Further MIG setup screens are shown in Figs.~\ref{fig:mig_create}
through~\ref{fig:mig_syssigs}. These figures may be found, in their full
screen resolution, within the {\tt doc/gfx} directory, should you struggle to
read them here. We can walk through them still anyway.
Fig.~\ref{fig:mig_pincompat}
\begin{figure}\begin{center}
\includegraphics[width=5in]{../gfx/mig_02-pincompat.eps}
\caption{Select Pin Compatible FPGAs}\label{fig:mig_pincompat}
\end{center}\end{figure}
shows the dialog for selecting a pin compatible FPGA, should we wish to build
one MIG interface that will work for many FPGA's. Our purpose on the Arty is
just to get an interface up and running, so we don't need any pin compatible
FPGA settings.
Since the Arty has DDR3 SDRAM on it, we move forward with
Fig.~\ref{fig:mig_memsel}
\begin{figure}\begin{center}
\includegraphics[width=5in]{../gfx/mig_03-memsel.eps}
\caption{Memory standard selection: Pick DDR3}\label{fig:mig_memsel}
\end{center}\end{figure}
and select DDR3 instead of DDR2 (an earlier memory standard).
The next, and perhaps more complicated step, is to set up the memory controller
options. This is shown in Fig.~\ref{fig:mig_mctopts}.
\begin{figure}\begin{center}
\includegraphics[width=5in]{../gfx/mig_04-mctopts.eps}
\caption{MIG Configuration Options}\label{fig:mig_mctopts}
\end{center}\end{figure}
The critical piece in this setup is that we want a 3077~ps clock. The actual
constraints are that the clock period be greater than 3077~ps (less than
325~MHz, an Artix--7 constraint for this part), bus less than 3300~ps (greater
than 300MHz, a DDR3 constraint). While the part number listed is not the
exact part number for the board, because the speed is running so much slower
than the memory can accomplish, the difference becomes irrelevant. From the
schematic, the part has 16~lanes and it is wired at 1.35V. As for the ordering,
we could go either way. We set it to normal ordering.
One part of this screen does need some discussion: there are eight banks on this
memory part. Each bank can be accessed independently. Eight bank controllers,
one for each bank, would therefore make sense. We take Xilinx's suggestion
here to have fewer controllers, yet this is an option that may be adjusted
later.
The next configuration screen is the AXI parameter options, shown in
Fig.~\ref{fig:mig_axiopts}.
\begin{figure}\begin{center}
\includegraphics[width=5in]{../gfx/mig_05-axiopts.eps}
\caption{AXI memory options}\label{fig:mig_axiopts}
\end{center}\end{figure}
For this, we set the data width to the memory to
be 128~bits--the natural width of all memory operations. Narrow burst support
would offer more logic for no additional benefit, so we leave that off.
Finally, since we've measured
the delay through the controller at nearly 30~clocks, and since the wishbone
to AXI converter depends upon this interaction ID to re--order things when
done, we set this to 5~bits. (30~clocks $< 2^5$)
The memory options configuration menu is also non--trivial. We choose the
settings shown in Fig.~\ref{fig:mig_memops}.
\begin{figure}\begin{center}
\includegraphics[width=5in]{../gfx/mig_06-memops.eps}
\caption{Memory Controller Options}\label{fig:mig_memops}
\end{center}\end{figure}
Specifically, the external clock input to the Arty is 100~MHz. The frequency
of this clock rate is tightly tied to the clock rate set in
Fig.~\ref{fig:mig_mctopts}. Specifically, not only is the memory clock period
within Fig.~\ref{fig:mig_mctopts} tightly constrained, but that memory clock
period needs to be a ratio of the input clock period. Further, the numerator
in that ratio cannot be such that the numerator times the input clock frequency
is greater than 1600~MHz. Beyond this, we take the next four parameters from
Digilent's memory generation file. The last item on this screen, memory order,
we select so that sequential reads (should) never stall the bus. This choice
allows the Xilinx memory controller to precharge and activate the next bank,
before the actual cycle when it is accessed---saving several stall cycles.
(Whether or not Xilinx's memory controller does \ldots I don't know. Mine did.)
Further FPGA options can be seen in Fig.~\ref{fig:mig_fpgaopts}.
\begin{figure}\begin{center}
\includegraphics[width=5in]{../gfx/mig_07-fpgaopts.eps}
\caption{MIG FPGA Options}\label{fig:mig_fpgaopts}
\end{center}\end{figure}
Two particular things to note. The first is that the OpenArty design feeds
clocks to the MIG controller from a clock management tile rather than from
external pins. As a result, no buffers are required on either the system clock
pin or the reference clock pin. The second item is the choice to allow the MIG
controller to handle its own temperature measurements. This means that it uses
the XADC on board for that purpose. Had we wished to do otherwise, we would
have needed to provide the output of the XADC temperature reading to the
controller ourselves. The current Arty design does not (yet) do this.
Working from Digilent's MIG project file again, the termination impedence in
Fig.~\ref{fig:mig_extops}
\begin{figure}\begin{center}
\includegraphics[width=5in]{../gfx/mig_09-extops.eps}
\caption{Termination Impedence}\label{fig:mig_extops}
\end{center}\end{figure}
needs to be set to 50~Ohms.
The next challenge is pin assignment. For this, we start by selecting a fixed
and known pin assignment in Fig.~\ref{fig:mig_ioops},
\begin{figure}\begin{center}
\includegraphics[width=5in]{../gfx/mig_10-ioops.eps}
\caption{Select a fixed pin assignment}\label{fig:mig_ioops}
\end{center}\end{figure}
since the Arty board has already been built and \ldots it would be a challenge
to switch pins. This brings up a next screen where pins can be selected and
assigned. As this is a long tedious process, there is a file in the
distribution named {\tt migmem.xdc} which can be used for this purpose.
Select the ``READ XDC/UCF'' option, and then select the {\tt migmem.xdc}
file from the menu that follows as in Fig.~\ref{fig:mig_loaducf}.
\begin{figure}\begin{center}
\includegraphics[width=5in]{../gfx/mig_11-loaducf.eps}
\caption{Load a known XDC file}\label{fig:mig_loaducf}
\end{center}\end{figure}
Once the pins have been loaded, you should just be able to click {\em Validate}
and then move onto the next page of the dialog.
That page offers pin selection for several external I/O pins that the memory
controller may connect to. We choose to leave these unconnected, as in
Fig.~\ref{fig:mig_syssigs}.
\begin{figure}\begin{center}
\includegraphics[width=4in]{../gfx/mig_12-syssigs.eps}
\caption{MIG \#14}\label{fig:mig_syssigs}
\end{center}\end{figure}
The last screen is a confirmation screen. Click next, and accept the Micron
license agreement, and then a couple more next buttons, and the final Generate
button will cause the MIG to build a DDR3 SDRAM controller.
At this point, you should now be able to build the OpenArty {\tt .bit} file.
%
%
\section{Building the board support files}
The OpenArty project comes with a series of board support programs that are
designed to run from a Linux command line. The C++ source code for these
programs can be found in the sw/host directory. These programs have two
dependencies: the ZipCPU load program depends upon libelf, and the ZipCPU
debugger depends upon the ncurses library. If you have these two libraries,
your build should proceed without problems. If now, you may get them simply
by ussuing a:
\begin{lstlisting}[language=bash]
% sudo apt-get install ncurses-dev libelf-dev texinfo
\end{lstlisting}
% TODO: Remove the dependency on ZIPD.
A make in the sw/host directory should build all of these support programs.
These include:
\begin{itemize}
\item {\tt wbregs}: a program to read and write addresses on the wishbone bus,
and hence to test peripherals independent of the CPU.
\item {\tt netuart}: a program to convert the UART device provided by the board
to a TCP/IP device that can be connected to anywhere.
\item {\tt wbsettime}: a simple program to set the time on the real-time clock
core within the board.
\item {\tt dumpflash}: reads the current contents from the flash memory into a
local file
\item {\tt wbprogram}: programs new configuations into the flash
\item {\tt netsetup}: reads and decodes the MDIO interface from the ethernet
PHY controller
\item {\tt manping}: pings a computer using the ethernet packet interface.
This program does not have any ARP handling, so while it will wait
for a reply, the reply typically comes back in the form of an ARP
request rather than the ping response.
\item {\tt zipload}: Loads a program onto the ZipCPU, adjusting flash, block
RAM, and or SDRAM memory to do so. May also start the program running
if requested.
\item {\tt zipstate}: Returns information about whether or not the CPU is
running, is running in user mode, is waiting for an interrupt,
has halted, etc.
\item {\tt zipdbg}: a debugger with the capability to halt, reset and step
the CPU, as well as to inspect the state of the CPU following any
unexpected halt.
\end{itemize}
\section{Building the Verilator Simulation}
If you are at all interested in building the verilator simulation, you will
also need Verilator and GTKMM-3.0. To get these, you may type:
\begin{lstlisting}[language=bash]
% sudo apt-get install verilator libgtkmm-3.0-dev
\end{lstlisting}
At this point, a {\tt make} in the {\tt rtl} directory, followed by a
{\tt make} in the {\tt bench/cpp} directory will build a Verilator simulation
named {\tt busmaster\_tb}. You may run this program in place of {\tt netuart},
and then access the simulated Arty using the regular board support packages.
This simulation will use the TCP/IP port given in {\tt bench/cpp/port.h}, which
should be set identically to the port given in {\tt sw/host/port.h} used by
{\tt netuart}.
One difficulty at this point might be the fact that the Verilator simulation
file, {\tt fastmaster\_tb.cpp}, may reference unknown variables. This is a
result of an upgrade in Verilator beyond the current OpenArty distribution.
The variable names have changed from names starting with {\tt v\_\_DOT\ldots}
to names beginning with {\tt busmaster\_\_DOT\ldots}. Should there be any
confusion, the {\tt Vbusmaster.h} file in {\tt rtl/obj\_dir} should define
the correct names within it.
\section{Initially installing the core}
The OpenArty core may be installed onto the board via the Xilinx Hardware
Manager. If properly set up, you should be able to open the hardware
manager after you build an initial bit stream, open the Arty, select the
toplevel bit file, and request Xilinx to load the file.
If you are successful, the four color LEDs will blank while the hardware
manager is loading the hardware, and then turn to varying intensities of red.
\section{Connecting the PMods}
The OpenArty project is designed to work with four PMods: PModUSBUART,
PModGPS, PModSD, and PModOLEDrgb. These four provide the device with
serial port access, absolute time and position information, access to an
SD card, and the ability to control a small display.
If you do not have any of these devices, and wish to recover the logic used
by them, you may comment out the defines for {\tt GPS\_CLOCK},
{\tt SDCARD\_ACCESS}, and {\tt OLEDRGBACCESS} found in the {\tt rtl/busmaster.v}
file. This will recover all but the logic used by the PModUSBUART and PModGPS
serial ports, while replacing the registers with read--only memory values of
zero. Be aware, though, that doing this will cause any program using
the {\tt stdin}, {\tt stdout}, or {\tt stderr} file streams to hang.
The {\tt arty.xdc} file is designed so that these PMods can be connected as
shown in Fig.~\ref{fig:pmod-pic}.
\begin{figure}\begin{center}
\includegraphics[width=4in]{../gfx/openarty.eps}
\caption{Showing how the PMods are Connected}\label{fig:pmod-pic}
\end{center}\end{figure}
In this example, the PModOLED is connected to PMod port JB, and the PModSD is
connected to PMod port JD. Both the PModGPS and the PModUSBUART are both
connected to port JC, with the GPS connected on top and the USBUART on the
bottom.
\section{Testing the peripherals}
OpenArty has been designed so that all of the peripherals live on a
memory--mapped wishbone bus. This bus can be accessed, either by the ZipCPU
or by the host controller. Because of this model, peripherals may be tested
and known to work before the CPU is ever turned on. Two programs make this
possible: netuart and wbregs. Other programs may be built upon this model,
as long as they access the bus using the interface outlined in devbus.h.
Of the two programs, netuart simply turns the USB serial port interface of
the device into a TCP/IP interface. Netuart takes one argument, the
name of the serial port device which the Arty USB driver has created. In
my case, this tends to be /dev/ttyUSB1, although it has been known to change
from time to time:
\begin{lstlisting}[language=bash]
% netuart /dev/ttyUSB1
\end{lstlisting}
All of the other board support files connect to the TCP/IP port generated
by netuart. The port.h file is a compiled-in file, outlining where this
port can be found. By default, netuart listens to port {\tt 6510} on
{\tt localhost}, but it can be configured to listen to any port. The other
board support files will try to connect to netuart at the host and port
listed in port.h. Hence, if properly configured, you should be able to
access your Arty to command it, configure it, reload it, etc., from anywhere
you have internet access--in my case, from anywhere in the house.
Once you run netuart, you should then be able to watch, as a part of the
standard output stream of netuart, all of the interaction with the board.
While this may be useful for debugging, it's not all that legible to the
user. Lines that start with \hbox{``\#''} are lines from the device that are not
going to any client. A common line you will see is \hbox{``\# 0''}. This is
just the device saying that its command capability is idle. Lines that start
with \hbox{``$<$ ''} are commands going to the device, and lines starting with
\hbox{``$>$ ''} are responses from the core. So, at this point, run netuart and
wait a couple of seconds. If you do not see a \hbox{``\# 0''} line, then try a
different serial port, check that your core is properly configured, etc. Once
you do see a \hbox{``\# 0''} line, then you are ready for the next step.
The easiest way to test the peripherals is via the wbregs command. This
command is similar to the ancient peek and poke commands. It takes one
or two arguments. If given one argument, it reads from that address on the
bus. If given two arguments, it writes the value of the second argument to
the bus location given by the first argument. Hence one argument peeks
at the memory bus, two arguments pokes a value onto the memory bus.
Perhaps an example will help. At this point, try typing:
\begin{lstlisting}[language=bash]
% wbregs version
\end{lstlisting}
This should return and print a 32-bit hexadecimal value to your screen,
indicating the date of when you last ran make in the root directory before
building and installing your configuration into the device. This can be
very useful to know what configuration you are running, and whether or not
you have made the changes you thought you had made.
You may have noticed that wbregs read from address 0x0100, but did so by name.
Most of the peripherals have names for their addresses. The C language
names for these addresses can be found in regdefs.h, and a mapping to
wbregs address names can be found in regdefs.cpp.
Shall we try another? Let's try adjusting the LEDs. To turn all the LEDs off,
\begin{lstlisting}[language=bash]
% wbregs leds 0x0f0
\end{lstlisting}
To turn them all back on again,
\begin{lstlisting}[language=bash]
% wbregs leds 0x0ff
\end{lstlisting}
To turn the low order LED off without changing any others, write
\begin{lstlisting}[language=bash]
% wbregs leds 0x010
\end{lstlisting}
Having fun? Try running the program startupex.sh from the sw directory.
This will set some LEDs and Color LEDs in a fun, startup--looking, pattern.
Ready to test the UART? Using minicom, connect to the PModUSBUART. It
should also be connected to a /dev/ttyUSBx serial port device. If you aren't
sure, start minicom with:
\begin{lstlisting}[language=bash]
% minicom -D /dev/ttyUSB2
\end{lstlisting}
Then, configure minicom to use 115,200 Baud, 8-data bits, one stop bit, and
no parity.
Once you've done that, we can test it by sending a character across the UART
port:
\begin{lstlisting}[language=bash]
% wbregs tx 90
\end{lstlisting}
This should send a `Z' over the UART port. Did you see a `Z' in minicom?
If not, did you set the baud rate right? The UART is supposed to be set
for 115,200 Baud, 8N1 by default. If not, you can set it to that by writing
wbregs setup 705. The 705 comes from the clock rate, in Hz, divided by
115200. By leaving other higher order bits at zero, this becomes the default
baud rate of an 8N1 serial port channel.
Another fun program to run is {\tt netsetup}. This program takes no arguments,
and just reads and decodes the network registers via the MDIO port. The
decoded result will be sent to the screen.
\section{Subsequent Core Updates}
The board support file {\tt wbprogram} can be used to write .bit or .bin
files to the flash, so that the core can be updated once an initial core
is installed and running.
Although wbprogram expects the filename to end in either '.bit' or '.bin',
this is primarily to keep a user from doing something they don't intend to
do.
The basic usage of the wbprogram command is:
\begin{lstlisting}[language=bash]
% wbprogram [@address] file
\end{lstlisting}
wbprogram then copies the file to the flash, starting at the Arty address
of {\tt @address}. If no address is given, wbprogram writes the file at the
beginning of flash.
An example of how to do this can be found in the {\tt program.sh}.
{\tt program.sh} places the new configuration file into the alternate
configuration location. (An alternate script, zprog.sh, places the new
configuration at the beginning of the flash, where the FPGA loader will look
for it upon power up.) Once {\tt program.sh} places the new configuration
into flash, it then commands the FPGA via the ICAPE2 interface and an IPROG
command to reconfigure itself using this new configuration. As a result, this
can be used to load subsequent configurations into the FLASH.
\section{Building the ZipCPU tool-chain}
At this point, you should have some confidence that your configuration and
hardware are working. Therefore, let's transition to getting the ZipCPU
on the hardware up and running.
To do this, we'll start with getting a copy
of the ZipCPU toolchain and building it. Pick a directory to work in, and
then issue:
\begin{lstlisting}[language=bash]
% git clone https://github.com/ZipCPU/zipcpu
\end{lstlisting}
to get a copy of the ZipCPU project, together with toolchain. You'll also
need to double check that you have the pre-requisite packages to build this
tool chain, so on an Ubuntu~14 machine you would issue:
\begin{lstlisting}[language=bash]
% sudo apt-get install flex bison libbison-dev
% sudo apt-get install libgmp10 libgmp-dev libmpfr-dev libmpc-dev
% sudo apt-get install libelf-dev libisl-dev
\end{lstlisting}
Once these are all in place, you can then switch to the master ZipCPU
directory and type,
\begin{lstlisting}[language=bash]
% cd zipcpu; make
\end{lstlisting}
(you may need to issue the make command a couple of times \ldots)
This will build the GCC compiler for the ZipCPU from source.
It will also install this new compiler into the zipcpu/sw/install/cross-tools.
This new compiler will be called zip-gcc.
This will also build a copy of the binutils programs for the ZipCPU. These
include the assembler, {\tt zip-gas}, linker, {\tt zip-ld}, disassembler,
{\tt zip-objdump}, and many more useful programs.
The next step to using this toolchain is to place it into your path.
\begin{lstlisting}[language=bash]
% export PATH=$PATH:$PWD/zipcpu/install/cross-tools/bin
\end{lstlisting}
Once the toolchain is in your path,
\begin{lstlisting}[language=bash]
% which zip-gcc
/home/.../zipcpu/sw/install/cross-tools/bin/zip-gcc
\end{lstlisting}
should return the location of where this toolchain exists in your path.
\section{Building your first ZipCPU program}
Several example programs for the OpenArty project can be found in the
{\tt sw/board} directory. These can be used to test various peripherals from
the perspective of the CPU itself.
As a test of the build process, a good first progam to build would be
{\tt exstartup}. This program is very similar to the {\tt startupex.sh} shell
script you tried earlier. It simply plays with the color LEDs and some
on board timers. Once that is finished, it goes into a loop controlling
both the normal and the color LEDs based upon the button state and the switch
settings.
To build {\tt exstartup}, simply type {\tt make exstartup} from the
{\tt sw/board} directory of the {\tt openarty} project. (Don't forget to
include the ZipCPU toolchain into your path before you do this!)
\section{Loading a program}
Now that you have built your {\tt exstartup} program, it's time to load it
onto the board and start it up. The {\tt zipload} program can be used to
do this. {\tt zipload} can be found in the sw/host directory. To load a
ZipCPU program into the Arty, just type {\tt zipload} and the program name,
such as {\tt exstartup} in this case. To start the program immediately
after loading it, pass the `-r' option to {\tt zipload}. In our case, you
would type:
\begin{lstlisting}[language=bash]
% zipload -r exstartup
\end{lstlisting}
Hopefully, you can see the {\tt exstartup} program now toggling the LED's.
Once the initial display stops, you can adjust the switches and press buttons
to see how that affects the result.
If you wish to restart the {\tt exstartup} program, or indeed to run another
program, you can just run {\tt zipload} again with the new program name. This
will halt the previous program, and then load the new one into memory. As
before, if you use the `-r' option, the program will be started automatically.
\section{Some other test programs}
If you have the PModUSBUART, you might wish to try running a ``Hello, World''
program. This can be found in the hello.c file. It prints ``Hello, World''
to the PModUSBUART once every ten seconds. Had enough of it? You can stop
the CPU by typing {\tt wbregs cpu 0x0400}. This sends a halt command to the
debug register of the ZipCPU. More information about this debug register, and
other things that can be done via the debug register, can be found in the
ZipCPU specification.
If you have both the PModUSBUART as well as the PModGPS, the {\tt gpsdump.c}
program can be used to forward the NMEA stream from the GPS to the USBUART.
This should give you some confidence that the PModGPS is working.
As a third test, {\tt oledtest.c} will initialize the OLEDrgb device and cause
it to display one of two images in an alternating fashion.
\chapter{Architecture}\label{ch:architecture}
My philosophy in peripherals is to keep them simple. If there is a default
mode on the peripheral, setting that mode should not require turning any bits
on. If a peripheral encounters an error condition, a bit may be turned on to
indicate this fact, otherwise status bits will be left in the off position.
\section{Bus Structure}
The OpenArty project contains four bus masters, three of them within the CPU.
These masters are the instruction fetch unit, the data read/write unit,
and the direct memory access peripheral within the ZipCPU, as well as an
external debug port which can be commanded from over the main UART port
connecting the Arty to its host.
There is also a second minor peripheral bus located within the ZipCPU
ZipSystem. This bus provides access to a number of peripherals within the
ZipSystem, such as timers, counters, and the direct memory access controller.
This bus will also be used to configure the memory management unit once
integrated. This bus is only visible to the CPU, and located starting at
address {\tt 0xff000000}.
The ZipCPU debug port is also available on the bus. This port, however, is
only visible to the external debug port. It can be found at address
{\tt 0x20000000} for the control register, and {\tt 0x20000004} for the
data register.
Once the MMU has been integrated, it will be placed between the instruction
fetch unit, data read/write unit, and the rest of the peripheral bus.
The actual bus chosen for this design is the Wishbone Bus, based upon the
pipeline mode defined in the B4 specification. All optional wires required
by this bus structure have been removed, such as the tag lines, the cycle
type identifier, the burst type, and so forth. This was done to simplify
the logic within the core.
However, because of the complicated bus structure--particularly because of the
number of masters and slaves on the bus and the speed for which the bus is
defined, there are a number of delays and arbiters placed on the bus. As a
result, the stall wire which is supposed to be depend upon combinational logic
only, has been registered at a number of locations. What this means is that
there are a variety of delays as commands propagate through the bus structure.
Most of these are variable, in that they can be turned on or off at build time,
or even that the stall line may (or may not) be registered as configured.
All interactions between bus masters and any peripherals passes through the
interconnect, located in {\tt busmaster.v}. This interconnect divides the
slaves into separate groups. The first group of slaves are those for which the
bus is supposed to provide fast access to. These are the DDR3 SDRAM, the
flash, the block RAM, and the network. The next group of slaves will have their
acknowledgements delayed by an additional clock. The final group of slaves
are those single register slaves whose results may be known ahead of any read,
and who only require one clock to access. These are grouped together and
controlled from within {\tt fastio.v}.
Further information about the Wishbone bus structure found within this core
can be found either on the Wishbone datasheet (Ch.~\ref{ch:wishbone}), or in
the memory map table in the Registers chapter (Ch.~\ref{ch:registers}).
\section{DDR3 SDRAM}
{\em It is the intention of this project to use a completely open source
DDR3 SDRAM controller. While the controller has been written, it has yet to
be successfully connected to the physical pins of the port. Until that time,
the design is running using a Wishbone to AXI bus bridge. Memory may still
be read or written, after an initial pipeline delay of roughly 27~clocks per
access, at one access per clock.}
{\em The open source SDRAM controller should be able to achieve a delay closer
to 9~clocks per access--once I figure out how to connect it to the PHY.}
\section{Flash}
\section{Block RAM}
The block RAM on this board has been arranged into one 32kW section.
Programs that use block RAM will run fastest using the block RAM, both for
instructions as well as for memory.
\section{Ethernet}
The ether net controller has been split into three parts. The first part is
an area of packet memory. This part is simple: it acts like memory. The
receive memory is read only, whereas the transmit memory is both read and
write. Packets received by the controller will be found in the receive memory,
packets transmitted must be in the transmit area of memory. The octets
may be found in memory with the first octet in the most significant byte.
This is the easy part.
The format of the packets within this memory is a touch more interesting.
With no options turned on, the first 6~bytes are the destination MAC
address, the next 6~bytes will be the source MAC address, and the {\em next
4~bytes} will be the EtherType repeated twice. This was done to align the
packet, and particularly the IP header, onto word boundaries. If the hardware
CRC has been turned off, the packet must contain its own CRC as well as
ensuring that it has a minimum packet length (64 octets) when including that
CRC.
With all options turned on, however, things are a touch simpler. The first
two words of the packet contain the destination MAC (for a transmit packet)
or the source MAC (for a received packet), followed by the two--octet
EtherType. At this point the packet is word--aligned prior to the IP header.
Since broadcast packets are sent to a special destination MAC other than
our own, a flag in the command register will indicate this fact.
The second part of the controller is the MDIO interface. This follows from
the specification, and can be used to toggle the LED's on the ethernet,
to force the ethernet into a particular mode, either 10M or 100M, to control
auto--negotiation of the speed, and more. Reads or writes to MDIO memory
addresses will command reads or writes via the MDIO port from the FPGA to the
ethernet PHY. As the PHY can only handle 16--bit words, only 16~bits will
ever be transferred as a result of any read/write command, the top 16~bits
are automatically set to zero. Further details of this capability may be
found within the specification for the chip.
The MDIO interface may be ignored. If ignored, the defaults within the
interface will naturally set up the network connection in full duplex mode (if
your hardware supports it), at the highest speed the network will support.
However, if you ignore this interface you may not know what problems you are
suffering from this interface, if any. The {\tt netsetup} program has been
provided, among the host software, to help diagnose how the various MDIO
registers have been set, and what the status is that is being reported from
the PHY.
The third part of the controller is the packet command interface. This
consists of two command registers, one for reading and one for writing.
Before doing anything with the network, it must first be taken out of
reset. According to the specification for the network chip, this must
happen a minimum of one second after power up. This may be done by simply
writing to the transmit command register with the reset bit turned off.
To send a packet, simply write the number of octets in the packet to the
transmit control register and set the GO bit ({\tt 0x04000}). Other bits
in this control register can be used to turn off the hardware MAC generation
(and removal upon receive), the hardware CRC checking, and/or the hardware
IP header checksum validation (but not generation). The GO bit will remain
high while the packet is being sent, and only transition to low once the
packet is away. While the packet is being sent, a zero may be written to the
command register to cancel the packet--although this is not recommended.
Packets are automatically received without intervention. Once a packet has been
received, the available bit will be set in the receive command register and
a receive packet interrupt will be generated. The ethernet port will then
halt/stall until a user has reset the receive interface so that it may
receive the next packet. Without clearing this interface, the receive port
will not accept further packets. Other status bits in this interface are
used to indicate whether packets have been missed (because the interface was
busy), or thrown out due to some error such as a CRC error or a more general
error.\footnote{It should be possible to extend this interface so that further
packets may be read as long as the memory isn't yet full. This is left as an
exercise to others.}
\section{SD Card}
\section{GPS Tracking}
\section{Configuration port}
The registers associated with the ICAPE2 port have been made accessible
to the core via the {\tt wbicapetwo} core. More information about the meaning
of these registers can be found in Xilinx's ``7--Series FPGAs Configuration
User's Guide''.
Testing with the OpenArty board has tended to focus on the warmboot capability.
Using this capability, a user is able to command the FPGA to reload its
configuration. In support of this, two configuration areas have been
defined within memory. The first is the default configuration, found at
the beginning of the flash. This configuration is sometimes called the ``golden
configuration'' within Xilinx's documentation because it is the configuration
that the Xilinx device will always start up from after a power on reset. On
the OpenArty, a second configuration may immediately follow the first in flash.
Commanding the FPGA to reload it's configuration is as simple as
setting the WBSTAR (warm boot start address) register to the location of the
new configuration within the flash, and then writing a 15 (a.k.a. IPROG)
to the FPGA command register (offset 4 from the beginning of the ICAPE2
addresses). Examples of doing this are found in the
{\tt sw/host/zprog.sh} and {\tt sw/host/program.sh} scripts. The former
programs the default configuration and then switches to it,
This configuration capability makes it possible for a user to 1) reprogram
the flash with an experimental configuration in the second configuration
location, and 2) test the configuration without actually touching the board.
If the configuration doesn't work well enough to be communicated with, the
board may simply be powered down and it will come back up with the initial
or golden configuration. If the golden configuration ever gets corrupted,
or loaded with a configuration that will not work, then the user will need to
reload the FPGA from the JTAG port.
\section{OLED}
\section{Real Time Clock}
The Arty board contains a real time clock core together with a companion
real time date/calendar core. The clock core itself contains not only current
time, but also a stopwatch, seconds timer, and alarm. The real time date core
can be used to maintain the current date. The real--time clock core uses the
GPS PPS output, as schooled by the GPS tracking circuit, in order to synchronize
their subsecond timing to the GPS itself. Further, the real--time clock core
then creates a synchronization wire for the real--time date core.
Neither of these cores exports its subsecond precision to the rest of the
design. This must be done using either the internal GPS tracking wires, or
by reading the time information from the tracking test bench.
\section{LEDs}
The Arty board contains two sets of LEDs: a plain set of LEDs, and a colored
set of LEDs.
The plain set of LEDs is controlled simply from the LED register. This register
can be used to turn these LEDs on and off, either individually or as a whole.
It has been designed for atomic access, so only one write to this register
is necessary to set any particular LED.
The color LEDs are slightly different. Each color LED is supported by its
own register, which controls three pulse width modulation controllers. Three
groups of eight bits within the color LED register control the PWM thresholds,
first for red, then green, and then in the lowest bits for blue. These are
used to turn on and off the various color components of the LEDs. Using this
method, there are $2^{24}$ different colors each of these LEDs may be set
to.
\section{Buttons}
\section{Switches}
\section{Startup counter}
A startup counter has been placed into the basic peripheral I/O area. This
counter simply counts the clocks since startup. Upon rollover, the high
order bit remains set. This can be used to sequence the start up of components
within the design if so desired.
\section{GPS UART}
The GPS UART, debug control UART, as well as the auxilliary UART, are all
based upon the same underlying UART IP core, sometimes known as the WBUART32
core. The setup register is defined within the documentation for that core,
and provides for a large baud rate selection, 5-8 data bits, 1-2 stop bits,
and several parity choices. Within OpenArty, the GPS core is initialized
to 9.6~kBaud, 8 data bits, no parity, and one stop bit.
When a value is ready to be read from the GPS uart, the GPS interrupt line
will go high. Once read, and only when read, will this interrupt line reset.
If the read is successful, only bits within the bottom eight will be set.
If a read is attempted when there is no data, when the UART is in a reset
condition, or when there has been a framing or parity error (were parity
to be turned on), the upper bits of the UART port will be set.
In a like manner, the GPS device can be written to. Certain strings, if sent
to the UART, can be used to change the UARTs baud rate, its serial port
settings, or even its reporting interval. As with the read port, the transmit