Mastering Cloud Automation on Azure – The Power of IaC! ☁️

azure_iac_code_300x200 — “*Infrastructure as Code*” – IaC on **Azure**

# Looking back 🔍

How was on-premises infrastructure managed over the last 10 to 15 years? 🤔 Traditionally, managing IT infrastructure was a manual, ticket-driven process. If developers needed a new server, they have submitted a request, and an administrator would manually provision it using a graphical user interface (GUI) or some type of management console – selecting, clicking, and configuring settings (called “ClickOps” approach). 🖱️ This worked when infrastructure was relatively static, with virtual machines (VMs) 🖥️ living for months, years or sometimes even a decade. ⏳

However, the IT landscape as well as the management of IT resources has changed over time. The rise of cloud computing ☁️ brought API-driven infrastructure, with a greater flexibility, and a scale far beyond what manual processes can handle. Resources and whole landscapes can be provisioned, spin up ⬆️ and down ⬇️ within minutes, and organizations need a way to automate and manage this complexity efficiently.

This is where Infrastructure as Code (IaC) comes in. 👍 By codifying infrastructure definitions and provisioning processes, IaC enables automation, version control, repeatability and reusability. Instead of managing thousands of tickets, IT teams can deploy infrastructure through code and declarative configurations, ensuring consistency, reducing errors, and accelerating workflows. This declarative approach serves as the foundation of IaC. Instead of detailing the steps to achieve a configuration (“how“), the desired state of the infrastructure must be defined (“what“), allowing the IaC tool to determine the necessary actions to reach that state. 🏎️

In this blog post for the Azure Spring Clean 2025 (#AzureSpringClean), we will explore how IaC transforms infrastructure management, the key benefits it offers, and why it is essential for modern cloud environments like Microsoft Azure. So, let’s dive in! 🤿

# Why to focus on IaC? Benefits! 🔝

The short answer – Cost, Speed, Risk.

IaC enables the automated management of IT infrastructure using machine-readable definition files rather than manual configurations over a GUI. This approach ensures many advantages regarding resource provisioning – not only in the cloud, but also on-premises. The IaC approach follows DevOps principles, allowing infrastructure components such as networks, VMs, storages, gateways and many more to be deployed predictably, similar to how source code produces consistent binaries. In other words, 💬 you use code to define the infrastructure that needs to be deployed in a descriptive language. Just like the source code of a .NET web application, the infrastructure code is also a part of the solution and is stored in the version control system (VCS) as well (e.g., GitHub, Azure DevOps, GitLab, etc.). 🔁

If organizations follow the IaC approach consistently, a lot of benefits 🏆 can be realized, like the following list shows as an overview:

iac_advantages_overview — List of main **advantages** following the “IaC” approach

# Challenges & Prerequisites ⚠️

“Where there is light, there must be shadow“. ➡️ Implementing IaC in an organization brings significant benefits, but it also presents several big challenges that must be carefully managed, to guarantee a successful cloud journey:

Knowledge Gap & Mindset Shift

One of the primary obstacles is the lack of knowledge and skills among traditional IT administrators who are not accustomed to coding. Professionals who have relied on manual configurations for many years may find the transition to a code-based IaC approach extremely difficult. This challenge is compounded by the fact that IaC requires a complete shift in mindset, which can lead to resistance within teams. Moving from a manual, ad-hoc style of infrastructure management to a structured, automated approach often encounters skepticism and reluctance from those unfamiliar with version control, scripting, and automated deployment processes.

✅

Configuration Drifts

Another major issue is the risk of configuration drift caused by manual changes. In a traditional setup, administrators might tweak configurations directly through a graphical user interface or the command line, but in an IaC environment, such manual interventions can create inconsistencies between the actual infrastructure and the codebase. Enforcing a strict “read-only” policy for direct changes while ensuring that all modifications go through version-controlled code updates is crucial, but often difficult to implement in practice right from the beginning of a project. Moreover, keep in mind that when the CI/CD pipeline runs, any manual changes made to cloud resources via the GUI will be reverted to match the code-defined configuration.

✅

Manage Continuous Updates

IaC tools and cloud providers operate on an evergreen model, where regular updates introduce new features, deprecate old ones, and sometimes break existing configurations. Therefore, organizations must proactively plan, test, and manage these updates to avoid disruptions. Unlike traditional IT infrastructure where quick fixes can be applied directly, IaC enforces changes through code, requiring proper documentation, peer reviews, and approval through merge requests. This structured approach improves stability but can be frustrating when urgent fixes are needed.

✅

Code Structure & Deployment Strategy

Establishing a well-defined code structure and deployment strategy is another significant challenge. Organizations must carefully design how they manage infrastructure across different stages such as DEV, TEST, and PROD for example. This includes defining branching strategies in VCS like Git, ensuring that changes flow smoothly through the pipeline without introducing instability in production. Without a clear strategy, deployments can become chaotic, leading to failed updates and inconsistent environments.

For further details, please see the chapter below about “Git Repository Structure” in this blog article. 👇

✅

# Top 5 – IaC Tools for Azure 🪛

Selecting the right IaC tool depends on various factors such as team expertise, use case complexity and integration requirements as already mentioned in the challenges above. Each of these tools has its strengths 💪 and the choice ultimately depends on the specific needs of the project or organization. When working with Microsoft Azure, several IaC tools can be used to define, provision and manage cloud resources. ☁️

👉 The following are the most popular IaC tools (top 5) for Azure:

HashiCorp Terraform

Terraform, developed by HashiCorp, is a widely used IaC tool (“top dog“) that enables declarative configuration and automation of cloud infrastructure. It uses HashiCorp Configuration Language (HCL) and supports a broad range of Azure services through its Azure provider. Terraform’s state management allows for efficient tracking of resource changes, making it a preferred choice for multi-cloud and hybrid cloud deployments. Key features include a declarative syntax using HCL, state management for tracking infrastructure changes, multi-cloud support including Azure, AWS, GCP and modules for reusable infrastructure components.

1️⃣

Bicep

Bicep is a domain-specific language (DSL) developed by Microsoft (.bicep files) as a more readable and manageable alternative to ARM Templates. It simplifies the deployment of Azure resources while still compiling down to ARM Templates. Key features include a more concise and readable format compared to JSON-based ARM Templates, no state management required as it leverages Azure’s native deployment capabilities, strong type safety and IntelliSense support in VS Code and native integration with Azure DevOps and CI/CD pipelines.

2️⃣

# Modularization 📦

HashiCorp Terraform has become the go-to choice 👍 over the last few years for “infrastructure as code” (IaC) enthusiasts, providing a powerful and flexible way to define, manage, and provision infrastructure resources, not only on Microsoft Azure! But as the infrastructure codebase of an organization grows, keeping it organized, maintainable, and scalable can be a challenge. That is where modularization comes into play. 🥇

When working with IaC, modularization is a fundamental and very important practice that brings structure, maintainability and scalability to infrastructure management. It involves breaking down IaC configurations into reusable and independent modules, rather than maintaining a single, monolithic configuration.

In Terraform, for instance, modularization is achieved by creating reusable modules for resources like Azure storage account, virtual network, databases or VMs, as the following screenshot illustrates.

⚠️ Every single module is a separate repository! Do not create ‘monster‘ 👹 repos containing multiple modules! 🤢🤮 👉 Link

A well-structured central module library (“registry“) 📕 in Terraform may include a lot of modules over time, making it easier to manage complex infrastructure at scale. By adopting modularization, organizations improve efficiency, minimize errors, and ensure that their infrastructure remains flexible and adaptable to change. It is a best practice that should be at the core of any well-architected IaC strategy.

There are 10 compelling reasons (or even more) 😉 to focus on modularization in Terraform, like the overview in the following chart shows:

creating_good_modules_terraform — How to create **good** modules in Terraform?

📢 For a deeper dive into Terraform modularization, please check out my detailed blog post: 👉 The Art of Modularization in Terraform. 📦

# Workspaces in Terraform CLI 🚀

Workspaces are a built-in feature of Terraform CLI that allow to manage multiple instances (or environments) of the same Terraform configuration within a single backend and without modifying/switching authentication credentials. They enable logical separation of infrastructure environments, such as DEV, TEST, and PROD, without needing to create and manage multiple configuration files or state files manually.

When using Terraform with a remote backend (such as an Azure Storage Account), workspaces provide a way to dynamically create and switch between different state files. Each workspace maintains its own separate Terraform state, ensuring isolation between different environments or deployments. 📄

⚠️ The initially created “default” workspace of a terraform state file, cannot be deleted! But new workspaces like DEV, TEST, and PROD can be added. 📚

👉 Please keep in mind that there are some differences between Workspaces in Terraform CLI and HCP Terraform Workspaces.

# Git Repository Structure 📄

Properly structuring the Terraform project repository is crucial for maintainability, scalability, and collaboration. ↗️ A well-organized repository enhances efficiency by enabling teams to manage infrastructure code in a systematic and modular manner. By adopting best practices for repository structure, teams can ensure better version control, environment isolation, reusability, and security.

Terraform projects often involve multiple environments, teams, and cloud services, making it essential to implement a structure that minimizes complexity while maximizing flexibility. Choosing the right approach – whether a Mono-Repo or Multi-Repo strategy (video) – depends on factors such as team size, governance requirements, and the complexity of infrastructure components. Additionally, employing a consistent module strategy and leveraging automation can significantly improve workflows and operational efficiency (video).

The following main questions regarding repository structure always occur during a resource deployment using Terraform:
❓Is the repository for the Terraform project set up a right way?
❓What are the best practices for managing Terraform code in general?
❓What is the recommended approach?

The points and approaches in the following subchapters of this article outline the best practices for structuring Terraform repositories, providing recommendations on folder organization, branch management, CI/CD automation, etc.

Mono-Repo 🙄

A single (mono) repository containing all Terraform configurations (including application code, infrastructure components, etc.) for different environments and services, referenced by each business domain, product or team. Regarding Mono-Repos there are 2 types of approaches which will be explained in the following two subchapters including illustrations:

Single Workspace per repository branch 🌿

✔️ Used for long-running branches, with less files to manage.
✔️ Create one branch and Terraform workspace for each environment (e.g. DEV, TEST, PROD).
✔️ Listens to changes on a specific branch.
✔️ The code runs when a pull request is created or push event occurs.
✔️ Move configuration over the stages ➡️ Do a merge of the branches.
✔️ Limit permissions on branch level. 🔐

❌ Unintentional configuration drifts (“out of sync”) of the different environments/stages. 🪓
❌ Hard to manage “.tfvars” file with environment-dependent values. 🤬
❌ “Cherry-pick” 🍒 often needed.
❌ If the cloud infrastructure gets more and more complex, administration of branches can end up in a mess quite easily. 😩
❌ Git branch and TF management “overkill” ☠️

single_repo_per_branch_tf — **Mono**-Repo – Single Environment per **Branch** 🌿

Single Workspace per repository directory 📁

✔️ Used for significant differences per environment/stage.
✔️ Separate directory for each environment. Used for short-lived branches which are constantly merged into the “main” branch. 🔜
✔️ Different infrastructure resources and configurations for different stages are easily implementable.
✔️ Create a Terraform workspace for each environment directory. 📁
✔️ The code runs for all stages when a pull request is created towards the “main” branch. 📚
✔️ Single source of truth for each environment.

❌ Higher risk of accidental changes in the wrong environment. 🤕
❌ Consistent module tagging is difficult, because the version tag affects the whole repository.
❌ Harder to integrate with workflows relying on directory-based structures. 🔁
❌ Limited to implement fine-grained access control for the different stages. 🔐 Use Code Owners.

single_repo_per_directory — **Mono**-Repo – Single Environment per **Directory** 📁

👷 From a practical point of view:
Honestly, each pattern has its own flavor. 🍲 I have never used these 2 approaches in “bigger” enterprise/real-life scenarios, due to several key limitations. Firstly, modules are tightly coupled to a dedicated project-based on the folder structure, lacking a centralized/shared module library 📕or Terraform registry for better reusability and management. 🏛️ Secondly, versioning for modules becomes unmanageable, as changes require copying and modifying entire folders instead of using proper version tags (semantic versioning). Thirdly, these approaches introduce a higher risk of configuration drift, making it difficult to maintain consistency and control across environments. Lastly, they do not scale well, as managing multiple directories for modules becomes increasingly complex and error prone as infrastructure grows. 🤢🤮

➡️ Given these constraints, I have opted for alternative strategies that provide better modularity, version control, scalability, and stability as the “Multi-Repo” 🥰 approach explained the next chapter. 👍😉

Multi-Repo 👍😘

Using separate repositories for different business domains, product or teams. Terraform Modules are stored in individual/separate repositories and referenced using (semantic) versioning (e.g., v1.3.1) when needed, as already mentioned in detail in the previous section of this post. This approach mainly used for enterprise project has the following advantages and disadvantages:

✔️ Good approach for complex projects over multiple large teams.
✔️ Enable teams to self-service on shared/reusable modules over a (private) Terraform registry.
✔️ Module versioning using tags with semantic versioning (e.g., v2.4.6) on each module repository.
✔️ Separated module repos allow greater control (RBAC). 🔐
✔️ Blast radius is limited only to dedicated/changed repositories.
✔️ Application code (“logical layer“) is separated from the modules.
✔️ Faster development and flexibility. 🏎️

❌ Manage a high number of modules across multiple repositories can lead to administrative challenges.
❌ High level of automated processes is needed (“automation of the automation“) regarding Git.
❌ Keeping module versions (e.g., provider updates) aligned across repos.
❌ Each repository requires its own CI/CD configuration (recommendation of a repo template) and maintenance.

# Impact on IT organization 🏭

The cloud transformation with “Infrastructure as Code” (IaC) has a huge impact on the IT organization of an enterprise, affecting people, processes, and technology. 📳 It introduces automation, agility, and scalability while changing traditional IT roles and responsibilities. The chart below is an outline of possible team structures, the key impacts and a possible IT operating model including responsibilities and examples.

cloud_operating_model — Future Cloud **Operating Model** – *Platform*, *Product* & *Solutions* Team

Solutions/Application Team

It focuses on delivering specific business solutions that leverage cloud infrastructure. They work closely with stakeholders to design, develop, and deploy applications, workloads or services. Moreover, it integrates product capabilities to deliver business apps, ensuring they are deployed efficiently in the cloud.

📄 Responsibilities:
– Understand business needs and translate them into technical solutions.
– Deploy and manage applications in the cloud.
– Work with the “Platform Team” to use pre-defined infrastructure patterns.
– Define infrastructure requirements for their specific solutions/applications.
– Use IaC tools (e.g., Terraform, Bicep, etc.) to deploy solution-specific cloud resources in collaboration with the “Product Team“.
– Ensure security, compliance, and cost-effectiveness of deployed solutions.
– Monitor and optimize application performance.

✏️ Example:
The “Solutions Team” might develop and deploy a data analytics platform on Azure, requiring compute instances, storage, databases and networking, all defined in Terraform.

➡️

Product Team

It is responsible for the lifecycle of a specific product (e.g., SaaS application, API service) and ensures continuous delivery, improvement, and user satisfaction. The “Product Team” builds and manages applications that run on the infrastructure, ensuring high availability and performance – building the link between the world of application and platform. 🔁

📄 Responsibilities:
– Define and prioritize product features based on user feedback.
– Ensure the product is scalable, reliable, and secure in the cloud environment.
– Manage cloud infrastructure required for their product using IaC.
– Work with the “Solutions Team” to integrate product capabilities into solutions.
– Collaborate with the “Platform Team” to ensure standards and best practices are followed.
– Automate CI/CD pipelines for rapid and reliable deployments.
– Monitor application health and performance metrics.
– Provide (optional) predefined Terraform templates for application workloads.

✏️ Example:
The “Product Team” would ensure that the necessary Azure resources for an AI-based system (e.g., GPUs, Kubernetes, AI Services, ML, etc.) are available and optimized using the Terraform modules in the central module library of the enterprise (“registry“) provided by the “Platform Team“.

🏛️

Platform Team

Successful cloud adoption depends on a comprehensive and highly standardized strategy for deploying, securing, and managing many applications within an enterprise. To achieve this, organizations establish the “Platform Team“. The members of this team provide standardized tools, cloud providers, systems and foundational infrastructure for solution/application teams, such as automated workflows, default golden images, cloud landing zones, and so on. By replacing manual tasks with shared services, they streamline deployments and enable developers to work more efficiently (“acceleration“).

The “Platform Team” builds and maintains the baseline cloud infrastructure and shared services that other teams (“Solutions Team” & “Product Team“) use. The team has the task to standardize the cloud approach within the whole organization:

📄 Responsibilities:
– Design and manage reusable infrastructure components (e.g., networking, storage, database, etc.).
– Provide standardized Terraform modules or blueprints for other teams.
– Automate infrastructure provisioning using Terraform, Bicep, or other IaC tools.
– Enforce governance, security, and compliance standards.
– Monitor and optimize cloud cost, performance, and security (“observability“).
– Support DevOps practices, CI/CD pipelines, and automation frameworks.
– Enable self-service infrastructure provisioning or central module libraries for “Solutions Teams” and “Product Teams”.

✏️ Example:
The “Platform Team” might create Terraform modules for a standard “Spoke” network or “SQL database” setup, which all teams can use with central Terraform registry to ensure consistency and security.

🏗️

A cloud transformation with IaC reshapes IT organizations tremendously, making them more agile, automated, and efficient. While the “Solutions Team“, “Product Team“, and “Platform Team” each have distinct responsibilities, their interactions are not rigid but rather dynamic and collaborative (interdisciplinary/cross-functional “Business DevOps” Teams).

The “Solutions Team” often works closely with the “Product Team” when implementing business-specific applications that depend on product capabilities. In many cases, a “Solutions Team” might identify gaps or opportunities for product enhancements, leading to a feedback loop that influences the roadmap of the “Product Team“.

Similarly, the “Product Team” and the “Platform Team” collaborate to ensure that infrastructure standards and best practices are followed while maintaining flexibility for innovation. The “Product Team” relies on the “Platform Team” standardized infrastructure components and automation tools, but in some cases, they might need custom solutions, Terraform templates, prompting discussions and iterative improvements to the platform. In practice, the boundaries between the “Platform Team” and the “Product Team” become blurred. In some cases, this can even mean that the teams merge (Link) within an enterprise. Moreover, the “Platform Team” works closely together with the “Governance Team” of an organization, according to the function structure of the “Cloud Center of Excellence” (CCoE). This collaboration 👥 ensures that all guidelines are implemented on cloud platforms, application workloads remain compliant, and cloud control is maintained. 🎛️

The “Solutions Team” and “Platform Team” also work together to align deployment strategies, ensuring efficient infrastructure usage without introducing unnecessary complexity. While the “Platform Team” provides reusable infrastructure modules and governance policies, the “Solutions Team” contributes real-world implementation insights that can refine and evolve these foundational services.

Rather than operating in silos, all 3 teams continuously interact, share knowledge, and adapt to evolving requirements (“Biz-DevOps-Teams“). This fluid collaboration ensures that cloud infrastructure remains scalable, efficient, and aligned with both business and technical needs. 🏁