The work we do at Autodesk touches nearly every person on the planet. By creating software tools for making buildings, machines, and even the latest movies, we influence and empower some of the most creative people in the world to solve problems that matter.
As a Research Engineer at Autodesk Research, you will be working side-by-side with world-class researchers and engineers to build new ML-powered product features that will help our customers imagine, design, and make a better world. You are a software engineer who is passionate about solving problems and building things. You have experience building datasets that combine different data modalities such as text, images, and 3D models. Your skills span across CAD data processing, analysis, indexing, retrieval, and experimentation at multiple scales. You are excited to collaborate with AI researchers to build datasets that power generative AI features in Autodesk products. You are a good communicator and comfortable working at the intersection of research & product.
The location of this role is flexible. We are a global team, located in London, San Francisco, Toronto, and remotely. Autodesk is a flexible hybrid-first company, allowing workers to work remotely, in an office, or a mix of both.
- Own and lead engineering projects in the area of data acquisition, ingestion, and curation
- Organize and curate large, unstructured, disparate multi-modal (text, images, 3D models, video) data sources into a unified format suitable for machine learning
- Develop and deploy highly scalable distributed systems to process, filter, and deploy datasets for use with machine learning
- Conduct and analyze experiments on data to provide insights
- Writing robust, testable code that is well documented and easy to understand
- Analytical advisor role that requires understanding of the theories and concepts of a discipline and the ability to apply best practices
- A common career stabilization point (AKA the “full-contributor” level) for Professional roles
- Require knowledge and experience such that the incumbent can understand the full range of relevant principles, practices, and practical applications within their discipline
- Solve complex problems of diverse scope by taking a new perspective on existing solutions and applying knowledge of best practices in practical situations.
- Use data analysis, judgment, and interpretation to select the right course of action
- Apply creativity in recommending variations in approach
- “Connect the dots” of assignments to the bigger picture
- May lead projects or key elements within a broader project
- May also have accountability for leading and improving on-going processes
- Build effective relationships with more senior practitioners and peers, and build a network of external peers
- Work independently, with close guidance given at critical points
- May begin to act as a mentor or resource for colleagues with less experience
- BSc or MSc in Computer Science, or equivalent industry experience
- Experience with software version control, unit tests, and deployment pipelines
- Programming stuff here
- Strong data modelling, architecture, and processing skills with varied data representations including 2D and 3D geometry
- Excellent written communication skills to document code, data analysis, and findings from experiments
- Experience with cloud services & architectures (AWS, Azure, etc.)
- Experience with relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra)
- Experience with frameworks such as Ray data, Metaflow, Hadoop, Spark, and Hive
- Experience with implementing ML models
- Experience working with large data lakes and data streams
- Proficiency with Linux systems and bash terminals
- Experience with computational geometry such as mesh or boundary representation data processing
- Experience with CAD model search and retrieval, in PLM systems or other searchable CAD databases
- Knowledge of the design, manufacturing, AEC, or media & entertainment industries
- Knowledge of statistics
- Ability to analyze data and communicate results effectively using tools such as Pandas, Matplotlib, Seaborn, Plotly, R or others
- Experience using open-source pre-trained language and vision/language models such as Bert, Llama, LLaVA, etc
- Experience with NLP tools such as Spacy, NLTK, Gensim etc
- Experience with creating large scale datasets and benchmarks
- Team player with a high degree of curiosity
- Will not be intimidated by the details of domain specific file formats and will have the self-drive and creativity to connect the dots between information stored in different sources to provide new and useful features for machine learning models
- Proficiency in software engineering and cloud-based systems to deliver these features to machine learning projects through the creation and deployment of scalable data pipelines