Network.AWS.*.Types.Product seems rather demanding to compile #304

DaveCTurner · 2016-07-20T15:32:06Z

Particularly the EC2 one:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
3725 linuxad+  20   0  1.000t 2.856g  26744 R 100.0 74.0   4:53.88 ghc

I hypothesise this would be better if it were in more, smaller modules. Opening this issue here as I will investigate it at some point if nobody else gets there first, although definitely not before September.

The text was updated successfully, but these errors were encountered:

brendanhay · 2016-07-20T16:17:01Z

Yes, this is known.

At the beginning of all time, there was only a single .Types module. Then cameth the .Sum and .Product split which alleviated it somewhat.

To break it any further than that requires checking for dependencies between the types and then splitting into n arbitrary modules, which to be honest hasn't been high in my priority list.

brendanhay · 2016-07-20T16:19:09Z

I'll also add that a non-negligible part of the compilation time is due to deriving statements. For example there is a measurable difference between with deriving (Generic, ..) and without.

DaveCTurner · 2016-07-20T16:28:13Z

I can believe it. It'd be interesting to see whether generating explicit instance declarations helps with that.

kim · 2016-11-22T11:18:06Z

Any changes recently that may have worsened the situation? After a long while with all cores maxed out building amazonka-ec2-1.4.4, I reliably get:

    ghc: panic! (the 'impossible' happened)
      (GHC version 8.0.1 for x86_64-apple-darwin):
    	piResultTys1
      Show AcceptVPCPeeringConnectionResponse
      [[Char]]

    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

GHC 7.10.3 terminates, but is agonisingly slow as well.

brendanhay · 2016-11-22T11:30:46Z

No changes aside from new types generated by upstream service definition changes have been added since this issue was opened.

Do you still get this issue if you pass -j2 (or -j1) to stack and cabal and go on holiday for a bit?

I've recently made some inroads into splitting the types modules up further, but have yet to complete it or benchmark any potential compilation time gains.

kim · 2016-11-22T13:11:56Z

~~-j1 seems to avoid the panic~~

EDIT: amended, I had gotten lost in curated package sets. 1.4.3 seems to work, though

DaveCTurner · 2016-11-22T16:50:28Z

I was just thinking about this again, particularly in regards to the EC2 compile. Some results about the dependency structure of shapes that may be of use:

There are no dependency cycles between shapes (in the EC2 service at least), so one-module-per-type would work. However, I imagine this extreme may be almost as bad for performance as the current monolithic module, so I'm thinking of better ways to group the types into clusters...
Given that there are no cycles, one could simply topologically sort the shapes and declare their types in batches of a fixed size. (On the assumption that if there were any cycles elsewhere they would be small, this technique would also work as long as cycles were not split across batches.) The resulting modules would make no logical sense but I expect they would be quicker to compile, for some optimal batch size. There would still be a lot of types in scope while compiling successive modules, and there would be no opportunity for parallelisation. Thinking now about more logical groupings, to reduce the quantity of types that need importing and improve parallelisation.
Approximately half the shapes (rough guess, counting both sum and product types) relate to a single operation, and thus could be declared in the operation's module instead.
Of the remaining ~230 shapes, just under 100 of them appear as a field of exactly one other shape, so for simplicity these can be declared together too.
TagList is used in quite a lot of places, so might be worth declaring separately.
Of the remaining ~130 top-level shapes there is one substantial cluster of interrelated ones that might make sense to compile together:
SecurityGroupIdStringList
PrivateIpAddressSpecificationList
InstanceNetworkInterfaceSpecificationList
VolumeType
BlockDeviceMapping
BlockDeviceMappingList
BlockDeviceMappingRequestList
RIProductDescription
SpotInstanceType
SpotInstanceStateFault
SpotDatafeedSubscription
IamInstanceProfileSpecification
InstanceType
ReservedInstancesConfiguration
RunInstancesMonitoringEnabled
GroupIdentifierList
SpotPlacement
SpotInstanceRequestList
ExcessCapacityTerminationPolicy
SpotFleetRequestConfigData
AttachmentStatus
InstanceBlockDeviceMappingList
NetworkInterfaceAttachment
VpcAttachment
InternetGateway
NetworkInterfaceStatus
NetworkInterfaceAssociation
NetworkInterface
GatewayType
VpnState
VpnConnection
VpnGateway
(and, of course, the types of their fields etc.) This can be defined programmatically by looking for a connected component of the dependency graph starting from, say, VpnGateway.
There's a handful of other, much smaller clusters - the next largest is a niner:
DiskImageFormat
ExportEnvironment
ContainerFormat
ExportTask
DiskImageDetail
DiskImageDescription
DiskImageVolumeDescription
PlatformValues
ConversionTask
There are also many isolated nodes in this graph.

I therefore reckon you could do get good results by putting all the types related to a single operation into that operation's module, picking out a handful of special cases (e.g. TagList) to declare as "base" types, then organising the rest into connected components and packing the components into modules so as to avoid modules of extreme size (either too big or too small).

Hope that's of interest.

brendanhay · 2016-11-28T15:54:22Z

Thanks @DaveCTurner, that's helpful and in line with my own findings.

The problem is due to the auto-generated nature of the libraries I need to bake some form of maximal independent set calculation on the shape graph into the generation step. This is currently where the work lies, as it needs to be done for all libraries and not just amazonka-ec2. I started a month or so back actually exploring the problem, but haven't managed to progress further due to time constraints.

DaveCTurner · 2016-11-28T16:06:08Z

I see, yes. I was looking at EC2 since that seems to have the largest set of shapes (by about a factor of 2) so anything that works there has a decent chance of working elsewhere too.

I might be able to help with dividing the shapes up into sets that import nicely - I had something a bit different in mind from what you have so far - but I'm a bit daunted about actually generating the code from them. Perhaps it'd work to divide the work up along those lines?

endgame · 2021-05-10T22:38:06Z

Given that the .Sum and .Product modules are gone, types now have one type per file, and we have #549 and #550 discussing upstream GHC issues and stuff, should we close this @brendanhay ?

amazonka-ec2 remains a bit of a monster, but a 16-wide nix build of the CI set completed on my 32G machine last night with a few hundred MB to spare. Previously, such wide builds would OOM.

endgame added needs triage enhancement post 2.0 and removed needs triage labels Sep 27, 2021

endgame closed this as completed Jun 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network.AWS.*.Types.Product seems rather demanding to compile #304

Network.AWS.*.Types.Product seems rather demanding to compile #304

DaveCTurner commented Jul 20, 2016

brendanhay commented Jul 20, 2016

brendanhay commented Jul 20, 2016

DaveCTurner commented Jul 20, 2016

kim commented Nov 22, 2016

brendanhay commented Nov 22, 2016 •

edited

Loading

kim commented Nov 22, 2016 •

edited

Loading

DaveCTurner commented Nov 22, 2016

brendanhay commented Nov 28, 2016 •

edited

Loading

DaveCTurner commented Nov 28, 2016

endgame commented May 10, 2021

Network.AWS.*.Types.Product seems rather demanding to compile #304

Network.AWS.*.Types.Product seems rather demanding to compile #304

Comments

DaveCTurner commented Jul 20, 2016

brendanhay commented Jul 20, 2016

brendanhay commented Jul 20, 2016

DaveCTurner commented Jul 20, 2016

kim commented Nov 22, 2016

brendanhay commented Nov 22, 2016 • edited Loading

kim commented Nov 22, 2016 • edited Loading

DaveCTurner commented Nov 22, 2016

brendanhay commented Nov 28, 2016 • edited Loading

DaveCTurner commented Nov 28, 2016

endgame commented May 10, 2021

brendanhay commented Nov 22, 2016 •

edited

Loading

kim commented Nov 22, 2016 •

edited

Loading

brendanhay commented Nov 28, 2016 •

edited

Loading