Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kasli v1.2 wishlist #5

Closed
3 of 12 tasks
marmeladapk opened this issue Jun 13, 2018 · 80 comments
Closed
3 of 12 tasks

Kasli v1.2 wishlist #5

marmeladapk opened this issue Jun 13, 2018 · 80 comments
Assignees

Comments

@marmeladapk
Copy link
Member

marmeladapk commented Jun 13, 2018

From @marmeladapk on 2018-01-31 09:53

  • Rotate SATA connector
  • Have I2C and UART signals in the same bank (each of them)
  • Switch all components to our standard library
  • Switch U1 to nonBGA version
  • Maybe Bigger heat sink with clamps pending measurements with big artiq setup
  • Move the clock input SMA further towards the panel. Leave just enough gap for the insulating washer. Otherwise (with a gap, the two insulating washers, the panel, and the nut, even without the springy/claw washer) there is not enough thread to fully tighten an SMA connector.
  • Si5369 instead of Si5324+ADCLK944 (power, board space, components, cost). Or even Si5346
  • Move UART RxD away from bank 13
  • Replace 100k resistors with 0R on the SMA clock input and MMCX clock outputs to tie all grounds together.
  • Add note on schematic by the connectors saying something like "To float connector grounds, replace 0R with 100k. NB Kasli must be connected to mains ground for to avoid damage to it or other equipment connected to it!"
  • Add TVS on JTAG
  • Make board 2-3 mm shorter, so that technosystem doesn't need to grind each board to mount backplane connector... (they told me about it after five months!)
@marmeladapk marmeladapk self-assigned this Jun 13, 2018
@marmeladapk
Copy link
Member Author

From @hartytp on 2018-01-31 14:02

Is there a Kasli v1.2 planned for any time in the future?

@marmeladapk
Copy link
Member Author

From @jordens on 2018-01-31 14:38

I think we need more experience with it in the field to be able to say that.

@marmeladapk
Copy link
Member Author

From @marmeladapk on 2018-01-31 14:44

First let's see how v1.1 operates, IMO points in the top post are not worth it right now.

@marmeladapk
Copy link
Member Author

From @marmeladapk on 2018-02-02 21:04

Change the Si5324 (loss of lock) LOL LED to red (as it indicates an error condition) or invert it.

@jordens Oh come on! You told me to change it from red to green so there are fewer items in the BOM. :D

Anyway LOL polarity can be changed in register 22 B1 of Si5324.

@marmeladapk
Copy link
Member Author

From @jordens on 2018-02-03 07:57

Ah. Right. Then it's perfect. I forgot about both. ;)

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-03-28 21:45

The biggest thing I'd like to see changed on the list above is the heatsink, since the FPGA gets very hot atm. It might be worth going for a heatsink with a clip on fan.

@marmeladapk
Copy link
Member Author

marmeladapk commented Jun 13, 2018

From @hartytp on 2018-03-28 23:31

  • Replace 100k resistors with 0R on the SMA clock input and MMCX clock outputs to tie all grounds together.
  • Add note on schematic by the connectors saying something like "To float connector grounds, replace 0R with 100k. NB Kasli must be connected to mains ground for to avoid damage to it or other equipment connected to it!"

@marmeladapk
Copy link
Member Author

From @marmeladapk on 2018-03-29 08:30

I think we're slowly approaching point when we can think about next revision. Are there any other things we'd like to test before I start implementing changes? What's the consensus on Si5324/Si5369/Si5346?

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-03-29 09:06

@marmeladapk In the long run, I'm still potentially keen to implement WR on Kasli. Probably use a DAC + high-quality VCO for clock recovery. Then either use a LVPECL clock buffer (noise isn't critical here, so there are lots of options) or something like an AD9516-4 to do the fanout.

However, we're still doing a design study to make sure we get that right, so we won't have a design for that for another week or two.

Until/unless we switch to WR, I'm not fussed about any of the options being discussed. IMHO, the present clocking works well (modulo the stability/phase determinism issues with the Si5324) and none of the options presented above offer a good enough advantage in terms of cost/power/simplicity to be worth changing a working design and risking breaking things. However, if @jordens feels strongly about it, I don't object either.

@marmeladapk
Copy link
Member Author

From @jordens on 2018-03-29 09:21

Kasli must be connected to mains ground for to avoid damage to it or other equipment connected to it!

I don't think that is accurate and might even be wrong. I'd state how much potential difference we are willing and able to tolerate and what the actual ground paths in the system are. Like it is done on all measurement equipment.

@sbourdeauducq wanted to do tests with the Si5326 to guide that decision.

And I don't think a big heat sink will cut it. We have been equipping them with fans.

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-03-29 09:30

And I don't think a big heat sink will cut it. We have been equipping them with fans.

👍 Something like the heat sink on the KC705 would be nice.

I don't think that is accurate and might even be wrong. I'd state how much potential difference we are willing and able to tolerate and what the actual ground paths in the system are. Like it is done on all measurement equipment.

AFAICT, connecting to PCB ground to mains ground is the most fool-proof solution, which should prevent damage in basically all cases (this is what almost all T&M equipment does), so it's an easy, safe recommendation to make -- I'm not aware of any situation where this could be dangerous/lead to damage, even if it's not often/always optimal from a noise perspective. Maybe change must to should, or even just re-word it to say that the potential difference between all grounds must be limited to safe-levels for all equipment, for example by connecting Kasli to mains ground?

Having said that, if you have a better suggestion, then feel free to make it (can you give exact text, please, including values for potential differences you want to recommend).

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-03-29 09:33

@jordens @sbourdeauducq I believe the answer to this is "no", but to double check: we don't think that a bigger FPGA/higher speed grade would help with anything we're doing? e.g. for large Kasli designs, we're not close to being limited by FPGA resources, right? and, the higher speed grade wouldn't ease the CPU timing issues?

@marmeladapk
Copy link
Member Author

From @sbourdeauducq on 2018-03-29 10:28

After the siphaser system we introduced, I don't think the 5326 would improve anything significantly, it would just save something like one or two MMCMs in the FPGA since we can use the skew control registers instead. And if we start having it on any board, then we need to support both the 5324 and 5326 in the firmware. This family of chips appears to be exceptionally well-designed (case in point: the 5324 and 5326 are pin-compatible) and bug-free, so it's not a big issue if that has to be done, but why should we?

@marmeladapk
Copy link
Member Author

From @sbourdeauducq on 2018-03-29 10:38

Note that the 5326 does not have deterministic latency - all it brings to the table is built-in functionality to increase or decrease whatever random skew it has after locking, and higher loop bandwidth. So, it doesn't help with getting deterministic phase from the external clock input to the ARTIQ outputs.

I am in favor of either:

  • keeping the 5324 (the engineering costs associated with the change and the maintainance of multiple clock chip variants in the firmware do not seem to be worth using a 5326, 5369 or 5346).
  • designing a high-performance PLL a la White Rabbit.

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-03-29 11:06

Thanks @sbourdeauducq. In that case, here is my suggestion:

  • @WeiDaZhang keeps working on design of a WR implementation for Sinara, as well as carefully characterizing the performance of our current Si5324 setup and simulating/taking data on the performance of the WR system.
  • We agree on a cut-off date for changes to the design for the next version of Kasli
  • Once that cut-off date arrives:
    • if @WeiDaZhang's design isn't ready we stick with the current Si5324.
    • if we decide on the basis of @WeiDaZhang's measurements that the benefits from WR aren't sufficient to justify the changes, we stick with the Si5324
    • otherwise, we implement WR on Kasli for the next version.

Everyone happy with that plan? If so, what's the deadline for this decision?

@marmeladapk
Copy link
Member Author

From @sbourdeauducq on 2018-03-29 11:18

Change the Si5324 (loss of lock) LOL LED to red (as it indicates an error condition) or invert it.

@jordens Oh come on! You told me to change it from red to green so there are fewer items in the BOM. :D

Do we need this LED at all? Lock status is accessible from the firmware. I've never used that LED personally.

@marmeladapk
Copy link
Member Author

From @jbqubit on 2018-03-29 12:55

AFAICT, connecting to PCB ground to mains ground is the most fool-proof solution

Agreed that this is the most fool-proof. The default configuration should protect casual end users as well as isolate the manufacturer from liability. The grounding implementation could be made so that it's easy to modify. Then modifications which might cause harm to body or the board itself are at the risk of the end user.

@marmeladapk
Copy link
Member Author

From @sbourdeauducq on 2018-03-29 16:49

Everyone happy with that plan?

Sounds fine, but integrating the WR PLL into ARTIQ doesn't sound trivial; we need to plan for the manpower and development time in the firmware and gateware (in addition to the hardware changes).

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-03-29 17:14

Absolutely, yes. That's an essential part of the cost / benefit analysis. But let's get a concrete proposal to discuss first...

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-04-01 08:07

Sounds fine, but integrating the WR PLL into ARTIQ doesn't sound trivial; we need to plan for the manpower and development time in the firmware and gateware (in addition to the hardware changes).

If we do go down the WR route, I'd still want to keep the Si5324 as well for at least the next version. Obviously, we would want to be able to use Kasli even while the WR gateware/firmware is developed and debugged.

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-04-01 08:19

Is it worth considering switching to a Kintex FPGA and maybe increasing the ram width (cf the DMA issues @cjbe reported) for the next version?

Speed seems to be by far the biggest complaint of ARTIQ users, and the fact that Kasli is noticeably slower than the KC705 setups we've used in the past seems like a major step in the wrong direction. I'm all for optimising gateware/firmware, but it seems silly not to start from the fastest hardware platform we reasonably can -- I'm not sure about other users (@dhslichter @dtcallcock etc), but I would gladly pay a bit more for HW if it made my setups faster.

@marmeladapk
Copy link
Member Author

From @jordens on 2018-04-01 09:00

I'm against that. Lets keep kasli at the simple end. It was well known and acknowledged that it would be slower. Wider ram will lead to board space and power, thermal issues and redesigns. You are obviously free to fund a new device with a bigger fpga though.

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-04-01 14:09

Lets keep kasli at the simple end.

"Simple" doesn't have to equal "slow". I'm not convinced that putting a faster FPGA on there makes it not a simple design.

It was well known and acknowledged that it would be slower.

Really? That wasn't my impression. When I discussed this via email with you and @sbourdeauducq before Kasli v1.0's design was finalised I explicitly asked about whether there would be CPU frequency issues with the ARTIQ, and was told that there wouldn't be.

In any case, I think this point is largely irrelevant. What matters is whether, having used this in the lab and knowing what we know now, we still think the current design is the right one for the users, or whether changing the FPGA would be better. Let's not get hung up on why decisions were made.

You are obviously free to fund a new device with a bigger fpga though.

Firstly: I read that to imply that you are funding work on the next Kasli revision. Is that actually true? Does your contract with WUT specify more than the standard two design rounds? If not, is this something that @marmeladapk and @gkasprow are doing on their own steam without and funding? If so, I don't see why you're bringing up funding here.

Secondly: I've worked hard to avoid hardware fragmentation in this project because I (still) believe that's the only way we're going to get a set of high-quality, well supported hardware which is stocked at good prices from a commercial vendor. If we all take the line of "this is my project, so if you don't like it then make your own version" then we're going to end up with a multiplicity of shoddy boards. I think we can be a bit more mature than that and work to find solutions that work for everyone.

Thirdly: while you may have funded the original version of Kasli, if you want someone like Creotech to stock it then they have to believe that it's what the users want. So, let's have an discussion that focusses on technical points, rather than shutting things down with "this is my project, go away".

Wider ram will lead to board space and power, thermal issues and redesigns.

You've made this kind of assertion several times in this project only to be contradicted by @marmeladapk, who is actually doing the design work and has done the simulations. If you've done a simulation or have anything concrete to back up these claims then I'd love to hear about them. But, otherwise, I'd rather hear from @gkasprow or @marmeladapk.


tl;dr: if other users don't think a bigger FPGA is worth it (maybe this is worth addressing to the ARTIQ mailing list), or if @gkasprow or @marmeladapk think that it would be too much work/cause other issues, then let's leave it as is. But, if there are simple changes that can make Kasli work better for the users then we should consider them.

After all, it's not like the current FPGA on Kasli isn't causing problems right now, and that makes me concerned that in the long run it's not a very good choice.

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-04-01 14:12

It was well known and acknowledged that it would be slower.

Again, I'd love to hear from one of the other groups who are actually using ARTIQ to run experiments with (e.g. @dtcallcock @dhslichter) but my feeling is that the current slowness of ARTIQ makes it a massive pain in the neck for most use cases. Anything that makes it even slower is of very limited interest as far as I'm concerned.

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-04-01 14:40

To be a bit more concrete here, my concerns are things like: if we're struggling to make ARTIQ meet timing on Kasli as it is, what will happen when we want to add features like hard floating-point maths? Will we just have to accept that they aren't available on Kasli because we put a slow FPGA on it?

@marmeladapk
Copy link
Member Author

From @sbourdeauducq on 2018-04-01 14:57

It was well known and acknowledged that it would be slower.

Really? That wasn't my impression. When I discussed this via email with you and @sbourdeauducq before Kasli v1.0's design was finalised I explicitly asked about whether there would be CPU frequency issues with the ARTIQ, and was told that there wouldn't be.

I guess @jordens is talking about the RAM, which is obviously slower than on KC705 (16-bit vs. 64-bit data bus).
How strongly Vivado insists on making mor1kx systems slow on Artix-7, on the other hand, is a bit of a surprise. Switching to 7K70T might be OK (it's not much more expensive), if it weren't for the major PCB design change, plus another round of transceiver yak-shaving to make DRTIO and Ethernet work again (among their many problems, transceivers are not compatible between FPGA families and each comes with its own set of idiosyncrasies and obscure bugs).

@marmeladapk
Copy link
Member Author

From @sbourdeauducq on 2018-04-01 15:28

How strongly Vivado insists on making mor1kx systems slow on Artix-7, on the other hand, is a bit of a surprise.

Part of the reason it's a surprise is because uniprocessor systems (e.g. the DRTIO satellite, and other MiSoC ports to Artix-7 boards) meet timing; the problems appear with the ARTIQ dual-CPU design for some reason.

@marmeladapk
Copy link
Member Author

From @sbourdeauducq on 2018-04-13 06:49

They are quite affordable:

They are one order of magnitude more expensive than the current solution, even when using power adapters from reputable brands purchased from a EU/US distributor.
But yes, we can consider them. I prefer the screw terminal option, as the cables can be mounted into any subrack chassis, not just one compatible with some backplane. The cables are most likely at least one order of magnitude cheaper than the backplane, too; and the mechanical design is easier.

@marmeladapk
Copy link
Member Author

From @sbourdeauducq on 2018-04-13 07:01

Maybe just mount something like this on a piece of metal with a front panel and appropriate grounding connections:
14098-01
There are many vendors for this type of power supply and they are very cost-effective.

@marmeladapk
Copy link
Member Author

From @gkasprow on 2018-04-13 15:13

This was exactly what I proposed some time ago. I'd rather use resonant SMPS since it offers much higher efficiency, lower size and lower EMI. We use such one in Booster design.

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-04-13 15:17

@gkasprow Yes, you did, but we couldn't all agree on what we wanted.

having had a play with Kasli for a while, I think that this is the best way to go. If some users don't want it then we can make it optional. Just make sure it has some enclosure so that there is no exposed mains.

@marmeladapk
Copy link
Member Author

From @gkasprow on 2018-04-13 15:17

@dhslichter shouldn't the C14 be installed on the front panel?
In this way if we use such panel with EMC sprint, the ground will be automatically connected to the chassis.
Moreover, in such case we do not expose mains cables to the user which may by chance not attach them to the PSU and leave floating causing lethal thread to the user.

@marmeladapk
Copy link
Member Author

From @dhslichter on 2018-04-13 19:32

@gkasprow C14 in the front panel is OK, but I am nervous in case (for example) the power supply isn't fully inserted, if you are relying on the panel grounding to make contact. If you do it this way, I would still use a wire and screw contact to chassis for grounding, with the idea that one doesn't remove this front panel with the C14.

In general, I propose the following:

  • C14 in the back panel, fixed (non-removable). Attached the ground of the C14 to a screw on the chassis body using a wire. This could be a blank front panel if people prefer (cheaper than machining the back panel perhaps?), but importantly the ground connection should still be via wire to a screw in the chassis, NOT relying on front panel EMC gaskets.
  • No backplane, just wires with barrel connectors to power the individual boards inside the chassis, +12V and GND. No "plug and play" power supplies - too much engineering to make this work, supplies are pricier, and people are unlikely to change the power supply often (if ever).
  • Power supply should be resonant SMPS, enclosed chassis style as shown by @sbourdeauducq above. Connections should be either quick-connect spade lugs or screw terminals. I like the idea of buying a chassis-mount supply which can be mounted to a metal plate, or better yet to the side of the chassis. Fixed blank front panel works fine here if we just mount to the chassis wall.
  • Low side of power supply DC output should be grounded to chassis with a wire and a screw into the chassis (can be done by tying V- to ground on the SMPS itself).
  • All boards will therefore receive +12V and GND return via barrel connectors (or other cable), where the GND return is referenced to the chassis and thus the wall ground, all via screw or spade connections, for safety. Not relying on front panels to make the ground connection, so you don't have to worry about whether the cards are pushed all the way in from a safety standpoint.
  • From an EMI standpoint, chromate the back side of all front panels and use the standard EMC gasketing on the chassis. This means the inside of the chassis will be shielded when fully populated.
  • Make a ground connection between all boards and their respective front panels, which ensures front panel safety even when not fully inserted into chassis.

The downside here is that return currents could flow either through the front panel and EMC gasketing or through the actual ground return wire on the power connectors for each board. I think this is an unavoidable problem, though. And this way you have backup in case the front panel isn't fully inserted because the ground reference is provided through the return wire still.

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-04-13 20:45

Whatever we do, let's make sure we don't have any exposed mains voltages, otherwise our electrical safety guys will complain.

@marmeladapk
Copy link
Member Author

From @sbourdeauducq on 2018-04-14 17:10

I'd rather put the C14 on the front panel to enhance compatibility with different models of 3U subracks (multiple sources, half-width racks, shielded/unshielded, etc.) and reduce machining-related headaches and costs.
We can add a warning that the power supply should be fully inserted. There are many ways to break the hardware and this is just one of them.
Shouldn't the earth be also connected to the PSU's output ground and then to the daughtercards via the EEM cables anyway?
What level of insulation is acceptable? Are insulated crimped terminal plugs OK?

@marmeladapk
Copy link
Member Author

From @sbourdeauducq on 2018-04-17 03:14

Actually, maybe the front panels should not be connected to ground on the EEMs, and grounding/earthing goes through designated cables (are the EEM cables enough?) inside.
This way, we could have multiple galvanically-isolated subsystems in a single subrack, connected via DRTIO+fiber.

@marmeladapk
Copy link
Member Author

From @sbourdeauducq on 2018-04-27 03:21

  • Add an internal power connector option (maybe something like this)
  • Write "DO NOT HOTPLUG" next to the EEM connectors (@hartytp ?)

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-04-27 08:26

Add an internal power connector option (maybe something like this)

If you want that, I'd go for a Molex Mini-Fit Jr or similar (i.e. the same kind of thing used in ATX supplies). Way more robust than screw terminals for this kind of thing.

@marmeladapk
Copy link
Member Author

From @sbourdeauducq on 2018-04-27 08:27

Fine.

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-04-27 08:30

(If you have a strong preference for something else then I don't particularly mind, just saying how I would do it based on my experience with these kinds of things.)

@marmeladapk
Copy link
Member Author

From @sbourdeauducq on 2018-04-27 08:33

We can just have an actual ATX connector, since Kasli is also 12V.

@marmeladapk
Copy link
Member Author

From @hartytp on 2018-04-27 08:34

Yep.

@sbourdeauducq
Copy link
Member

With all the changes (WR, more EEMs) I think it should be called 2.0, not 1.2 :)

@marmeladapk
Copy link
Member Author

marmeladapk commented May 7, 2019

@hartytp, @jordens could you please make sure, that the wishlist in the top post is up to date?

Do we want to implement WR CDR as in Sayma and Metlino?

Here's the list of signals I'd like to move to I2C extenders (and route interrupt signals to the FPGA instead):

  • SFP 0,1,2 (TX_DISABLE, RATE_SELECT, RATE_SELECT1, TX_FAULT, MOD_PRESENT, LED, LOS)
  • led_user1-3
  • VUSB_PRESENT

@jordens
Copy link
Member

jordens commented May 7, 2019

  • All the other issues open should also be triaged for v1.2. THere are some important ones in there.
  • WR CDR once the first iteration has flown successfully beyond bread boards and the Si5324 should be scrapped then.
  • Ok moving the signals to an extender apart from the three SFP LEDs please. The other user_leds can go away.

@hartytp
Copy link

hartytp commented May 7, 2019

@hartytp, @jordens could you please make sure, that the wishlist in the top post is up to date?

I'll go through this issue and break the items out into individual issues.

@hartytp
Copy link

hartytp commented May 8, 2019

@marmeladapk I re-read through the issue and broke all items out into individual issues -- apart from the items at the top of this thread that have been ticked off (I assumed they are done already).

@hartytp hartytp closed this as completed May 8, 2019
@hartytp
Copy link

hartytp commented May 8, 2019

There is now a v2.0 milestone, which includes all changes we've discussed for v2.0

@hartytp
Copy link

hartytp commented Jun 6, 2019

@marmeladapk is there anything blocking finishing Kasli v2.0 other than people's schedules? Any idea what the timeline is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants