Role

Responsible for the daily operation and maintenance of our distributed storage systems such as file storage / block storage / object storage (e.g. online release, software deployment, monitoring, inspection, alarm response, etc.) to ensure the reliability of our distributed storage services；
Get familiar with the architecture of distributed platform, be able to find, debug and solve common faults, hidden dangers and performance problems, and be responsible for the execution of emergency plan and fault recovery strategy；
Engage in the construction of automatic tools and systems, be able to use and develop tools and platforms to improve the overall efficiency；

Minimum qualifications:

Bachelor’s degree Computer Science or related technical field, or equivalent practical experience；
Experience with Unix/Linux operating systems internals (e.g., filesystems, storage devices), and with networking (e.g., tcp/ip, routing) or cloud systems；
Experience with analyzing and troubleshooting storage systems；
Experience programming in one or more of the following: Shell, Python, Go, etc；

Preferred qualifications:

Experience designing or managing large-scale distributed storage systems, understanding the principle of distributed system and be familiar with open source distributed storage system (e.g. NAS, HDFS, CEPH)；
Experience with OP jobs as well as using SRE tools and systems (e.g. online release, monitoring, daily inspection etc.) and script programming；
Ability to get familiar with storage system and related managing systems in a short time；
Systematic problem-solving approach coupled with effective communication skills and a sense of drive；
Ability to communicate with domestic team and local team fluently in English, and speaking Chinese is preferred；