ARGLabs fictitious company

So you are a newcomer on our company. Congratulations !

Our DevOps culture

We like the DevOps “You build it, you run it.” culture.

It means that our Cloud Engineers (call them DevOps, SRE, infrastructure guys or whatever you want), will help you with the basic cloud infrastructure on which your applications will run, but they won’t be called to solve your apps problems and incidents, so please, take the ownership of everything you bring up on the cloud, including costs, incidents etc.

Cloud Engineers can always help you with architecture decisions and deployment pipelines, but once it’s running, it’s up to you to keep things up.


Cloud Engineering team

We have a Cloud Engineering team that are responsible for most of the shared services on the Cloud and the best practices and guidelines regarding how to run and deploy our applications.

Some of the things they are responsible:

  • Networking
  • DNS Infrastructure
  • Guidelines
    • Deployment pipelines
    • IaC
    • AWS services usage


Monitoring team

The monitoring team will help you use the monitoring, incident management and escalation tools.

They are not responsible for creating the monitoring items, threshold and alarms for your applications.

Every app is different from the others and the best team to configure all the monitoring parameters will always be the same team that will have to solve them: the application team.

These guys will help you test integrations, notifications, testing, generating metrics and reports etc.

Incident First Responders

They have a service called “Incident First Responders” that you can opt to use or not.

They are a 24×7 team and they can take basic actions in case of an incident with your application.

You should have all the possible problems and actions you want them to follow in order to respond to the given alerts, but have in mind that any manual action you tell them to take will brake your IaC code soon or later.

If they solve the problem following your procedure, they will create a card in your board reporting everything about the incident so you’ll know if you’ll have to change any code before deploying the app again. If the incident persists, they will follow the escalation plan you have made and call someone of your team. They only call the cloud Engineers if the incident is with any of the cloud Engineers services.


Applications Teams

Apps teams are responsible for:

  • Deploying, running and monitoring the team apps.
  • Evolve the team infrastructure repositories
  • Monitor their apps

Teams should use the NOC service to make them execute some basic scripted steps on specific failure scenarios, but the team is always responsible for these actions.