Understanding Data Replication in High Availability Configurations for GitHub Enterprise Server #190702
Replies: 2 comments
-
|
Context: I signed up for a GitHub trial (GitHub Pro / Copilot), where I was told a small verification charge (~$10) would be momentary and immediately released. GitHub’s handling of this charge is frankly unacceptable. This was not a “temporary” hold in any meaningful sense. The authorized amount has now been locked for a week—and can reportedly last up to 30 days—despite being presented as something that would be reversed almost instantly. GitHub Support claims the refund was processed “right away,” but this is misleading. In practice, GitHub delegates the release of the authorization to a financial intermediary, fully aware that this is not an immediate process. Describing that as “momentary” is simply inaccurate. After speaking with my bank, the situation is clear: the hold has not been released in a way that makes the funds available. There are only two ways forward—either wait for the authorization to expire (which can take up to 30 days), or have GitHub explicitly confirm to the bank that the hold should be released. This is the key issue: it should not be the customer’s responsibility to resolve this. If GitHub initiates the authorization, then GitHub should also ensure its timely release—without requiring users to chase their own money through banking procedures. In practice: GitHub claims it’s been handled For anyone on a tight budget, this is not a minor inconvenience—it directly affects day-to-day expenses. If GitHub knows these authorizations can persist for weeks, then calling them “momentary” is misleading and should be corrected. Users deserve clear, honest expectations about how long their money may actually be unavailable. This situation reflects poorly on both the transparency and accountability of GitHub’s billing practices. |
Beta Was this translation helpful? Give feedback.
-
|
Best practice when dealing with Github customer trial bamboozling tactics: Context: I signed up for a GitHub trial (GitHub Pro / Copilot), where I was told a small verification charge (~$10) would be momentary and immediately released. GitHub’s handling of this charge is frankly unacceptable. This was not a “temporary” hold in any meaningful sense. The authorized amount has now been locked for a week—and can reportedly last up to 30 days—despite being presented as something that would be reversed almost instantly. GitHub Support claims the refund was processed “right away,” but this is misleading. In practice, GitHub delegates the release of the authorization to a financial intermediary, fully aware that this is not an immediate process. Describing that as “momentary” is simply inaccurate. After speaking with my bank, the situation is clear: the hold has not been released in a way that makes the funds available. There are only two ways forward—either wait for the authorization to expire (which can take up to 30 days), or have GitHub explicitly confirm to the bank that the hold should be released. This is the key issue: it should not be the customer’s responsibility to resolve this. If GitHub initiates the authorization, then GitHub should also ensure its timely release—without requiring users to chase their own money through banking procedures. In practice: GitHub claims it’s been handled For anyone on a tight budget, this is not a minor inconvenience—it directly affects day-to-day expenses. If GitHub knows these authorizations can persist for weeks, then calling them “momentary” is misleading and should be corrected. Users deserve clear, honest expectations about how long their money may actually be unavailable. This situation reflects poorly on both the transparency and accountability of GitHub’s billing practices. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Abstract
Running GitHub Enterprise Server in a High Availability (HA) configuration? You need to understand how data replicates between your primary and replica appliances - for capacity planning, troubleshooting, and ensuring clean failovers. The official documentation covers HA setup. This article goes deeper: how replication works under the hood, what affects performance, and what to watch to keep things healthy.
Problem Statement
GitHub Enterprise Server administrators running HA configurations often ask:
The public docs cover setup and failover, but don't explain replication internals, performance factors, or troubleshooting in depth.
How Replication Works
GitHub Enterprise Server uses different replication strategies for different data types. Understanding these distinctions helps you troubleshoot issues and plan capacity.
The hub-and-spoke model
Replication follows a hub-and-spoke architecture:
Each node has a unique UUID, shown as
git-server-{UUID}in diagnostic output. To identify your primary, runghe-repl-status -vvon a replica and look for the node withvoting: true, or check/etc/github/repl-statedirectly on any appliance.Git repository replication
Git repositories replicate through the built-in "spokes" system:
Replication is asynchronous: your push succeeds without waiting for replicas to catch up. Maintenance operations run with a two-hour timeout. If they don't finish, the system retries. Backups pause repository maintenance to ensure a clean state.
Route updates happen automatically after configuration changes, but you may need
ghe-spokesctl routesto check and repair routes manually.MySQL database replication
MySQL uses binary log (binlog) replication:
You can check MySQL replication lag with
ghe-repl-status -vv, which shows theseconds_behind_primarymetric. This is the most common lag indicator to watch.Redis replication
Redis uses asynchronous replication:
Since Redis is in-memory, replication is fast.
Elasticsearch replication
Elasticsearch runs on both primary and replica in HA configurations, maintaining index replication across nodes. The replica keeps a synchronized copy of all search indexes. For more on how GitHub Enterprise Server rebuilt search replication for high availability, see this engineering blog post.
Storage and asset replication
File-based storage (user avatars, release assets, Git LFS objects) replicates with rsync:
Pages replication
GitHub Pages sites replicate through the spokes system, just like Git repositories.
Factors Affecting Replication Performance
Network bandwidth and latency
Since all replication traffic flows through the VPN between appliances:
Monitor network throughput between your appliances, especially if you see persistent lag.
Repository size and activity
Larger repos and higher push rates increase replication load:
Maintenance operations
Repository maintenance directly affects replication:
git gc) on large repos generates significant traffic as optimized data replicatesThe longer you wait between maintenance runs, the more stale refs accumulate - making the next run take even longer.
Resource constraints
CPU, memory, and disk I/O on both primary and replica affect replication:
If
ghe-spokesctl check --fixfails with "too busy" errors, you're hitting resource constraints.Monitoring Replication Health
Using ghe-repl-status
Run
ghe-repl-statusfrom any replica to check replication health:Add
-vvfor verbose output with detailed metrics.Checking repository network health
Use
ghe-spokesctlto check repository network health. Severalghe-spokessubcommands have already been replaced byghe-spokesctlequivalents, and the remaining ones are expected to follow.Watch for bad checksums - they mean replica data doesn't match the primary.
Job queue monitoring
Check job queue backlogs, especially the maintenance queue:
Large backlogs signal system stress and can cause replication delays. The older
ghe-resque-infocommand still works, but most background jobs now run through aqueduct.Key metrics to watch
When analyzing support bundles or monitoring HA health, check these in order:
HA Configuration Patterns
Two-node high availability
In a standard two-node configuration (one primary, one replica):
Geo-replication
With geo-replication, multiple replicas in different locations receive data from the primary:
Trade-offs:
Best Practices
Network design
Capacity planning
Repository management
ghe-spokesctl check -vproactively on your largest reposBackup coordination
Backups pause repository maintenance to ensure a clean state. Schedule backups considering:
Troubleshooting Common Issues
Repositories with bad checksums
Symptom:
ghe-spokesctl statusshows checksum mismatchesCommon causes:
Resolution:
ghe-spokesctl check -v owner/repoghe-spokesctl check -v --fix owner/repoRepository network route issues
Symptom: Repositories not replicating after configuration changes
Common causes:
Resolution: Run
ghe-spokesctl routes owner/repoto check and repair routesPersistent replication lag
Symptom:
ghe-repl-statusconsistently shows lag across multiple datastoresCommon causes:
Resolution:
ghe-aqueduct-infoSummary
Understanding how data replicates in your HA configuration helps you plan capacity, troubleshoot performance, and ensure clean failovers. Key takeaways:
ghe-repl-status, spokes info, job queues, and resource utilizationWhen you hit replication issues, work through the diagnostics systematically: resource utilization first, then spokes health, then specific repository networks. Most problems stem from resource constraints or network limitations - not bugs in the replication system.
Planning a new HA deployment, dealing with persistent lag, or need help troubleshooting? Reach out to GitHub Support with a support bundle. We're happy to help analyze your configuration.
Beta Was this translation helpful? Give feedback.
All reactions