Hello there

My current technology stack: .NET 9, Python, TypeScript, and Azure.

I develop microservices and terraform of different sizes. Sharing my challenges and key learning.

About

The views expressed in this blog are my own and do not reflect my employer's. I am not responsible for any consequences of using the information provided. This blog is for educational purposes only, not for commercial use. Readers should apply their own judgment.

Balancing Flexibility and Complexity - Pulumi

November 28, 2024 Dipankar Haldar 1 person likes this post 16 people viewed this post

The Real Problem with Pulumi: Balancing Flexibility and Complexity

Pulumi has rapidly gained traction in the Infrastructure as Code (IaC) space by offering a developer-friendly approach to cloud provisioning. Unlike traditional IaC tools like Terraform or CloudFormation, Pulumi allows engineers to define infrastructure using familiar programming languages such as TypeScript, Python, and C#.

While Pulumi brings significant advantages—better code reuse, improved maintainability, and seamless integration with CI/CD workflows—it also introduces real-world challenges that can complicate infrastructure management. This article explores the real problem with Pulumi: balancing its flexibility with the complexity it introduces in large-scale cloud deployments.

1. State Management is a Double-Edged Sword

Pulumi provides both a self-managed state (via cloud storage like Azure Blob, AWS S3) and a fully managed Pulumi Service backend. While this flexibility is great, managing state in Pulumi isn’t always smooth.

Common Issues:

State Locks and Concurrent Updates: Running pulumi up simultaneously from different machines can lead to state corruption if proper locking isn’t in place.
Drift Detection is Weak: Unlike Terraform, Pulumi doesn’t always detect drift effectively, leading to surprise failures when applying updates.
State Migration Complexity: Moving from one backend (e.g., self-managed to Pulumi Service) isn’t always straightforward and can cause unexpected issues.

2. Pulumi’s Type System Can Be Frustrating

Pulumi’s SDKs leverage static typing, which is great for catching errors early. However, the way Pulumi handles async values (Output<T>) can be painful for developers.

Real-World Pain Points:

Nested Outputs: When dealing with Azure resources, we often get deeply nested Output<T> types that must be resolved before they can be used elsewhere.
No Direct Mapping of Inputs and Outputs: Unlike Terraform, where outputs can be referenced as variables easily, Pulumi requires explicitly unwrapping outputs, leading to boilerplate code:

const storageAccountName = storageAccount.name.apply(name => `sa-\${name}`);

This added complexity can make infrastructure code harder to read and maintain.

3. Azure Support Lags Behind Other Providers

Pulumi supports multiple cloud providers, but Azure support often feels a step behind AWS and GCP. This is especially noticeable when using the @pulumi/azure-native provider.

Examples of Azure-Specific Issues:

Inconsistent API Coverage: Some Azure services don’t have full Pulumi support, requiring direct API calls via the Azure SDK.
Slow Provider Updates: When Azure releases new features, it takes Pulumi longer to support them compared to Terraform.
Managed Identity Issues: RBAC-based authentication can be inconsistent across different resource types.

4. Secret Management is Not Always Seamless

Pulumi has built-in secret management using the config system, but in enterprise environments, often require integration with tools like Azure Key Vault, AWS Secrets Manager, or HashiCorp Vault.

Challenges Include:

Secrets in Code: If not handled properly, sensitive values can accidentally be logged or exposed in Pulumi state.
Cross-Team Collaboration: Developers managing infrastructure as code may not have direct access to secret management tools, leading to operational friction.

5. Debugging is Harder Than Expected

Unlike Terraform, which has a detailed execution plan, Pulumi sometimes feels like a black box when debugging failed deployments.

Pain Points:

Error Messages Are Often Unhelpful: A failed pulumi up might show generic messages that don’t indicate the real root cause.
Dependency Graph Isn’t Always Clear: Terraform's plan output shows exactly what will be changed, while Pulumi sometimes lacks that clarity.
Logging in CI/CD Pipelines is Tricky: In a CI/CD environment, Pulumi logs can be overwhelming and lack structured debugging tools.

Conclusion: Should we use Pulumi?

Despite these challenges, Pulumi remains a powerful choice for modern cloud infrastructure. It excels in developer experience, automation, and integration with application code, but must be prepared for increased complexity in state management, Azure support limitations, and debugging difficulties.

If your developers has strong software engineering practices and is comfortable with async programming, Pulumi can be a great fit. However, if I prioritize predictability, mature tooling, and simple state management, Terraform or Bicep might still be the better option.

Pulumi is not a silver bullet—it’s a powerful tool that demands careful consideration. Understanding these real-world Pulumi problems can help to navigate the trade-offs and adopt best practices to get the most out of their IaC journey.