Site Reliability Engineer, Cloud Infrastructure- USDS
- Los Angeles, CA
- Permanent
- Full-time
Creation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.
Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day.
To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always.
At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve.
Join us.The infrastructure team of US Tech Services Department at TikTok supports the company's fast growth by building and operating hyper-scale datacenters, managing the life cycle of server fleet, providing cloud solutions, and developing various infrastructure services and making sure they are scalable and are reliable.Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed infrastructures. Our SREs are tasked to ensure the infrastructure services are reliable, fault-tolerant, efficiently scalable and cost-effective.In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.Responsibilities
- Build tools, automations, visualizations and monitors to facilitate the operation and optimization of the global infrastructure.
- Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues.
- Help improve the whole lifecycle of infrastructure services from inception and design throughout development, to deployment, user support and refinement
- Master's degree (or Bachelor's degree with 3+) years of experience in Computer Engineering, Electrical Engineering, Computer Science or related major
- 3+ years experience working with Unix Linux systems from kernel to shell and beyond with experience working with system libraries, file systems, and client-server protocols.
- 3+ years experience in one or more programming languages such as Java, C++, Go, or scripting experience in Shell and Python.
- Self-driven and capable of coping with ambiguity and move projects from concept to delivery.
- Strong in analytical skills and the ability to solve real world problems in a fast moving environment.
- Experience in designing, analyzing and building automation and tools for large scale systems
- Experience in building solutions with AWS, Google, Azures and other cloud services.
- Experience in networking technologies such TCP/IP, BGP, DNS, etc. in a carrier-grade environment.
- Experience in developing and operating one or more of following systems: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, etc.