Maxing out a Factom network

Inspired to post by the latest testnet results there are 2 observations about performance.

1. Factomd seems to be IO bound, maxing out a network will never max out the CPU of a single node.
I take this to mean that there is still room to improve.

2. On mainnet - we don't see the expected # of duplicate messages (from the point of view of a single node).
Essentially this means the gossip protocol performs differently on a larger network.

Need to find the actual data again - it was somthing like we expected a given node to see 6 coppies of a message, but we only see 4 on mainnet.
While in simulation we see the 'expected' number.

Comparison to smaller networks:

In a very controlled environment - running a 5 Node network on underpowered (n1-standard1) nodes - stable performance peaks at 40 61 TPS
If I boosted the same machines to 4 vcpus (n1-standard-4) - the network maxes out at 93 TPS

1586866962438.png


Test output https://gist.github.com/stackdump/07ed7ec0b8fb35a005e53f88417dad00 (from run with highest load)
 
Last edited:
1. you say factomd is IO bound rather than CPU bound. Then you proceed to a test where you increase the CPU capacity of the nodes and it results in TPS increase. Isn't that contradictory?
Yes, seemingly - but a larger VPC will surely support more IO.

The observation about the CPU is that we cannot force it to 100% utilization by loading it.

it could also mean that there is just some aspect of the current design that is throttling...

I think more careful benchmarking & profiling would probably lead to some answers.
 
Last edited:
1. Factomd seems to be IO bound, maxing out a network will never max out the CPU of a single node.
Not necessarily. Golang channels are kinda tricky and due to the cooperative scheduling, I found that some channels are just not being worked off "correctly", meaning channel A will drain completely even though channel B has a much larger pile waiting. In practice, this resulted in some of the non-blocking channels overflowing when they shouldn't.

That's the reason why in factomd some of the channels have a capacity of 5,000 or more, it gives them enough of a buffer to pile up. For p2p2, I ended up inserting a manual release of the goroutine after it pulls a parcel from the parcel channel: https://github.com/WhoSoup/factom-p2p/blob/master/parcelChannel.go#L18 And it's now using a channel capacity of just 50 without dropping parcels. (More details with playground examples in this issue)

The reason the CPU doesn't get maxed out could be due to issues like that, where CPU threads are idling because specific channels aren't being filled fast enough or block because they're not drained. You can check this by seeing if some cores are maxed out while others are idling. If all cores have an equal load, it's probably not that.

In the long run, bandwidth will be the limiting factor due to gossip's inefficiency.

2. On mainnet - we don't see the expected # of duplicate messages (from the point of view of a single node).
Essentially this means the gossip protocol performs differently on a larger network.
I wrote about this a lot in my early blogs, such as: https://factomize.com/gossip-analysis-and-optimization/ and https://factomize.com/gossip-optimization-part-2-breaking-the-hub/.

There is also the theoretical data of waste messages I collected at https://github.com/WhoSoup/p2p-fanout/wiki. The TL;DR is that with enough message bounces, you should get around the same # of duplicates as the -broadcast setting. (16 for mainnet) However, since I gathered that data, Clay improved the smart filter in factomd and now it no longer tries to send messages to peers that have sent you that specific message already.

The big goal of p2p2 was to re-order the network structure of the nodes and be able to get away with a much lower fanout setting while still keeping a high reach. That has the potential of reducing the number of messages by ~50% and combined with a much lower encoding overhead, should significantly increase the (theoretical) bandwidth utilization.

I've been collecting a couple of mainnet traffic captures (receiving application messages only), here's some data that maybe helps:

Duration: 68h 44m 56s
Messages per second: ~19.2
EPS: ~1.9
TPS: ~3.8
raw bandwidth: ~.037 Megabit/s
Size: 1,133,187,694 bytes (~1.05GiB)

Total # of Messages: 4,750,722
Duplicates: 3,987,292 (~83.9% or ~5 of 6)

However, a good chunk of those are "MissingMsg" which are not broadcast messages. If we filter out non-broadcast messages, we get: 3,987,292 dupes / 4,580,405 total (~%87) or 7 out of 8 duplicates.





Judging from the really low bandwidth, we should be able to get much higher EPS in a test environment. In a simple scaleup, 5 Mbit down/up should get us ~180 eps (there's a 30% overhead from encoding v9 messages). In your test setup, you have a 2Gbit connection, which should be plenty for that.

At some point I'd like to get a full message dump (both send and receive, and not just app messages) for proper analysis. It would also be great to test the p2p network independently of the factomd code, which would test the theory of whether I/O is the limiting factor or not.
 
Thanks for providing a thorough analysis (as always)

Not necessarily. Golang channels are kinda tricky and due to the cooperative scheduling, I found that some channels are just not being worked off "correctly", meaning channel A will drain completely even though channel B has a much larger pile waiting. In practice, this resulted in some of the non-blocking channels overflowing when they shouldn't.
I guess hunting for channels that are being starved for work while others are overflowing would be one symptom we could look for.

Sort of depressing that this doesn't feel like a new line of investigation - essentially this is what has been done repeatedly since m3 was first released.


At some point, I'd like to get a full message dump (both send and receive, and not just app messages) for proper analysis. It would also be great to test the p2p network independently of the factomd code, which would test the theory of whether I/O is the limiting factor or not.
My notion about IO is pretty vague - I'm not thinking Disk or Memory IO but more systems-level interrupts etc..
This actually may just be about Golang/channel scheduling as you say.

Seeing a p2p max (without troublesome factomd code) would be an interesting data point.
 
Top