Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network.AWS.*.Types.Product seems rather demanding to compile #304

Closed
DaveCTurner opened this issue Jul 20, 2016 · 10 comments
Closed

Network.AWS.*.Types.Product seems rather demanding to compile #304

DaveCTurner opened this issue Jul 20, 2016 · 10 comments

Comments

@DaveCTurner
Copy link

Particularly the EC2 one:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
3725 linuxad+  20   0  1.000t 2.856g  26744 R 100.0 74.0   4:53.88 ghc

I hypothesise this would be better if it were in more, smaller modules. Opening this issue here as I will investigate it at some point if nobody else gets there first, although definitely not before September.

@brendanhay
Copy link
Owner

Yes, this is known.

At the beginning of all time, there was only a single .Types module. Then cameth the .Sum and .Product split which alleviated it somewhat.

To break it any further than that requires checking for dependencies between the types and then splitting into n arbitrary modules, which to be honest hasn't been high in my priority list.

@brendanhay
Copy link
Owner

I'll also add that a non-negligible part of the compilation time is due to deriving statements. For example there is a measurable difference between with deriving (Generic, ..) and without.

@DaveCTurner
Copy link
Author

I can believe it. It'd be interesting to see whether generating explicit instance declarations helps with that.

@kim
Copy link
Contributor

kim commented Nov 22, 2016

Any changes recently that may have worsened the situation? After a long while with all cores maxed out building amazonka-ec2-1.4.4, I reliably get:

    ghc: panic! (the 'impossible' happened)
      (GHC version 8.0.1 for x86_64-apple-darwin):
    	piResultTys1
      Show AcceptVPCPeeringConnectionResponse
      [[Char]]

    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

GHC 7.10.3 terminates, but is agonisingly slow as well.

@brendanhay
Copy link
Owner

brendanhay commented Nov 22, 2016

No changes aside from new types generated by upstream service definition changes have been added since this issue was opened.

Do you still get this issue if you pass -j2 (or -j1) to stack and cabal and go on holiday for a bit?

I've recently made some inroads into splitting the types modules up further, but have yet to complete it or benchmark any potential compilation time gains.

@kim
Copy link
Contributor

kim commented Nov 22, 2016

-j1 seems to avoid the panic

EDIT: amended, I had gotten lost in curated package sets. 1.4.3 seems to work, though

@DaveCTurner
Copy link
Author

I was just thinking about this again, particularly in regards to the EC2 compile. Some results about the dependency structure of shapes that may be of use:

  1. There are no dependency cycles between shapes (in the EC2 service at least), so one-module-per-type would work. However, I imagine this extreme may be almost as bad for performance as the current monolithic module, so I'm thinking of better ways to group the types into clusters...

  2. Given that there are no cycles, one could simply topologically sort the shapes and declare their types in batches of a fixed size. (On the assumption that if there were any cycles elsewhere they would be small, this technique would also work as long as cycles were not split across batches.) The resulting modules would make no logical sense but I expect they would be quicker to compile, for some optimal batch size. There would still be a lot of types in scope while compiling successive modules, and there would be no opportunity for parallelisation. Thinking now about more logical groupings, to reduce the quantity of types that need importing and improve parallelisation.

  3. Approximately half the shapes (rough guess, counting both sum and product types) relate to a single operation, and thus could be declared in the operation's module instead.

  4. Of the remaining ~230 shapes, just under 100 of them appear as a field of exactly one other shape, so for simplicity these can be declared together too.

  5. TagList is used in quite a lot of places, so might be worth declaring separately.

  6. Of the remaining ~130 top-level shapes there is one substantial cluster of interrelated ones that might make sense to compile together:
    SecurityGroupIdStringList
    PrivateIpAddressSpecificationList
    InstanceNetworkInterfaceSpecificationList
    VolumeType
    BlockDeviceMapping
    BlockDeviceMappingList
    BlockDeviceMappingRequestList
    RIProductDescription
    SpotInstanceType
    SpotInstanceStateFault
    SpotDatafeedSubscription
    IamInstanceProfileSpecification
    InstanceType
    ReservedInstancesConfiguration
    RunInstancesMonitoringEnabled
    GroupIdentifierList
    SpotPlacement
    SpotInstanceRequestList
    ExcessCapacityTerminationPolicy
    SpotFleetRequestConfigData
    AttachmentStatus
    InstanceBlockDeviceMappingList
    NetworkInterfaceAttachment
    VpcAttachment
    InternetGateway
    NetworkInterfaceStatus
    NetworkInterfaceAssociation
    NetworkInterface
    GatewayType
    VpnState
    VpnConnection
    VpnGateway
    (and, of course, the types of their fields etc.) This can be defined programmatically by looking for a connected component of the dependency graph starting from, say, VpnGateway.

  7. There's a handful of other, much smaller clusters - the next largest is a niner:
    DiskImageFormat
    ExportEnvironment
    ContainerFormat
    ExportTask
    DiskImageDetail
    DiskImageDescription
    DiskImageVolumeDescription
    PlatformValues
    ConversionTask
    There are also many isolated nodes in this graph.

I therefore reckon you could do get good results by putting all the types related to a single operation into that operation's module, picking out a handful of special cases (e.g. TagList) to declare as "base" types, then organising the rest into connected components and packing the components into modules so as to avoid modules of extreme size (either too big or too small).

Hope that's of interest.

@brendanhay
Copy link
Owner

brendanhay commented Nov 28, 2016

Thanks @DaveCTurner, that's helpful and in line with my own findings.

The problem is due to the auto-generated nature of the libraries I need to bake some form of maximal independent set calculation on the shape graph into the generation step. This is currently where the work lies, as it needs to be done for all libraries and not just amazonka-ec2. I started a month or so back actually exploring the problem, but haven't managed to progress further due to time constraints.

@DaveCTurner
Copy link
Author

I see, yes. I was looking at EC2 since that seems to have the largest set of shapes (by about a factor of 2) so anything that works there has a decent chance of working elsewhere too.

I might be able to help with dividing the shapes up into sets that import nicely - I had something a bit different in mind from what you have so far - but I'm a bit daunted about actually generating the code from them. Perhaps it'd work to divide the work up along those lines?

@endgame
Copy link
Collaborator

endgame commented May 10, 2021

Given that the .Sum and .Product modules are gone, types now have one type per file, and we have #549 and #550 discussing upstream GHC issues and stuff, should we close this @brendanhay ?

amazonka-ec2 remains a bit of a monster, but a 16-wide nix build of the CI set completed on my 32G machine last night with a few hundred MB to spare. Previously, such wide builds would OOM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants