Software Development Intern

2 Years ago • All levels

Job Summary

Job Description

This Software Development Internship offers an opportunity to gain experience with a cutting-edge data processing system. The intern will become part of the team behind a high-performance analytical database system, learning about high-performance implementations in the database kernel and its distributed version based on the Hadoop environment. The intern will be part of the ActianX/Vector team, working with C, and will be mentored by senior developers, with responsibilities covering design and implementation challenges. This role involves contributing to the design, implementation, testing, and maintenance of the high-performance data processing kernel. The intern will also be involved in continuous integration management and will be expected to learn about state-of-the-art database development.
Must have:
  • Enrolled in a Bachelor’s or Master’s program in computer science.
  • Good programming skills in the C family.
  • Well-founded algorithm-design skills.
Good to have:
  • Experience in software development.
  • Experience with large-scale systems development.
  • Competent script programming skills (Python, Bash).
  • Experience with concurrent, parallel and network programming techniques.
  • Experience in business intelligence and data processing.
  • Knowledge of operating systems internals.
  • Knowledge of database concepts and technology.
  • A working knowledge of SQL.
Perks:
  • Internships lasting from 3 to 12 months.
  • Tailored projects for research and implementation.
  • Master Topics coordinated with universities.
  • Part-time jobs for first work experience.
  • Flexible working hours.

Job Details

We offer a position as Software Development Intern with the opportunity to gain experience on a cutting-edge data processing system. This is your chance to become a member of the team behind the fastest analytical database system on the market. You will learn about high performance implementations in the database kernel and in its distributed version based on the Hadoop environment. 

You will be part of our ActianX/Vector team, thereby engineering a family of products in C, which are provided for a variety of OS distributions. We are looking for team players that integrate into our distributed development team. You will be mentored by senior developers and be introduced into our software development process with responsibilities that cover design and implementation challenges.  

 
KEY RESPONSIBILITIES:  
To be curious and eager to learn more about state-of-the-art database development 
Contributions to design and implementation of enhancements improving performance, stability and scalability of our high-performance data processing kernel 

Create tests for and maintain the implemented functionality in our continuous integration management environment 

ESSENTIAL QUALIFICATIONS:   
Enrolled into Bachelor’s or Master’s Degree program in computer science or equivalent at a German university
Good programming skills (C family) 
Well-founded algorithm-design skills 

OPTIONAL SKILLS: 
Experiences in software development, e.g., advanced academic studies or in a commercial setting 
Experience with large-scale systems development 
Competent script programming skills (Python, Bash) 
Experience with concurrent, parallel and network programming techniques 

Experience in the areas of business intelligence, high-performance data processing and computer architecture and related 
Knowledge of operating systems internals (memory management, IO, scheduling etc.) 
Knowledge of database concepts and technology 
A working knowledge of SQL 

WE OFFER:

-        Internships– An internship with us may last from 3 to 12 months. For each internship, we provide a tailored project to research, design and implement a new functionality into our Vector database.
-        Master Topics – in coordination with your university’s examination office and your collaborating professor we will define a master project tailored to your needs and based on our available thesis topics.
-        Part-time jobs – Part-time jobs with us provide you with the opportunity to gain your first work experience in a program related field. Your contribution will help improve an already outstanding database product. Working hours and times are flexible and can be discussed when you decide to start a project with us.
 
Below you will find a list of topics together with a short explanation. These topics are either marked with (I)nternship, (M)aster topic and/or (J)ob.
 
Vector cloud deployment.
Providing databases as a hybrid on premise and in the cloud is a promising and already growing business. Our goal is to bring an on-premise Vector partially to the cloud and within this project the task is to exploit our cloud storage architecture used in Avalanche and bring it to the Vector product. (I,M)

Load balanced query execution in a clustered environment.
Load-balancing in a cluster is hard, because normally you cannot offload work to another node if that node does not have the data to work on. However, as the HDFS integration of VectorH controls its replication policy, there are opportunities to shift work around to other cluster nodes that already have the data. This requires developing a strategy for data placement, data processing and a work shifting strategy. (I,M)

External tables in Spark
VectorH supports reading data from external data sources such as Spark. The performance of queries accessing such external tables could be improved greatly, e.g., by pushing down selections or even subtrees of the query plan into Spark. The feature could also be extended with support for more sources and data types. (I,M)

Compact hash tables.
Smaller hash tables can be significantly faster, thanks to fewer CPU cache and TLB misses. The goal of this project is to find such compact representations by bit-packing multiple columns and using dictionaries for string data. (M)

PDTs on flash.
The goal of this project is to modify our structure for differential updates (Positional delta Trees - PDTs) to expand to disk. This requires the addition of a layer that resides on disk, most likely a flash disk. The fact that PDTs are expanded to flash would make it possible to store much more updates, hence reduce the checkpoint interval (where PDT updates are merged into the main data storage structures), and lead to the system being able to sustain much higher update workloads. Current research project with TU Ilmenau DBIS(I,M)

Collations.
Understanding the current use of character sets in Ingres including the way these character sets collate data and make these rules available also for Vector. Providing performance in that case is very difficult since some mechanisms heavily rely on expanding characters before processing them. Finding cache efficient algorithms for these cases is also part of the project. As an example, consider the ASCII order for “a”, “b” and “ä”. While ASCII would order these three letters “abä”, the German language typically requires “aäb”. (I,M)

Spatial data type support.
The goal of this project is the integration of geospatial datatype support into Vector. This requires the definition of new Vector datatypes and the integration into all stages of query execution. (I,M)

Tuple layout planning.
In this project, we want to challenge the way data is stored during query processing. In principle, any mix between horizontal and vertical storage (NSM vs. DSM) can be chosen. Some columns may actually be processed in vertical vectors, while other columns are processed in a tuple layout. Horizontal storage of data inside hash tables is already supported but needs to be extended to other operators. (M)

RDF in Vector. In principle, it should be possible to turn Vector into a highly efficient engine for RDF storage and query evaluation. This entails the storage of quads in a compressed PAX format, and a basic translation of SPARQL to SQL or even direct Vector algebra. (M)

Exploiting co-processors for Vector.
The most powerful piece of hardware in today’s average PC is the GPU, not the CPU. There have been studies how to express database operations of almost every conceivable type in GPUs. However, what is missing is a framework where complex queries consisting of many such operations could work together. (M)

Maintenance of our testing infrastructure. For our number one scoring TPC-H experiments we need to constantly stay up to date. Test numbers for our own improvements need to be recorded and maintained. In addition, all tests and comparisons need to be kept up-o-date with our competition (Impala, Hawk, SparkSQL and Hive). (J)

Adaptation of conversion functions.
There are many built-in datatype conversion functions that are slow in comparison to an actual optimized implementation. Replacing these functions will directly impact affected queries and lead to noticeable performance improvement. (J)


Similar Jobs

Scopely - Senior Software Engineer (PHP)

Scopely

Bengaluru, Karnataka, India (Hybrid)
10 Hours ago
Blackshark - Senior DevOps Engineer

Blackshark

Graz, Styria, Austria (On-Site)
2 Weeks ago
Ness Digital - DevOps Specialist – CI/CD

Ness Digital

Prague, Czechia (On-Site)
4 Weeks ago
Budge Studios - Build Master

Budge Studios

Quebec, Canada (Hybrid)
1 Month ago
Google - Product Engineer, Machine Learning Accelerators

Google

Mexico (On-Site)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Vendavo - DevOps Engineer

Vendavo

Chennai, Tamil Nadu, India (Hybrid)
1 Day ago
Dun & Bradstreet - 2025 Summer Internship Program - Technology

Dun & Bradstreet

Jacksonville, Florida, United States (On-Site)
6 Months ago
ION - Senior DevSecOps Engineer, Italy

ION

Pisa, Tuscany, Italy (On-Site)
6 Months ago
GoReel - DevOps Lead

GoReel

Romania (Remote)
1 Month ago
NVIDIA - Senior AI-HPC Cluster Engineer

NVIDIA

Westford, Massachusetts, United States (Hybrid)
1 Month ago
Hacksaw Studios - Client Platform Developer

Hacksaw Studios

Stockholm, Stockholm County, Sweden (On-Site)
2 Weeks ago
GoReel - Senior DevOps

GoReel

Bratislava, Bratislava Region, Slovakia (Hybrid)
2 Weeks ago
NewGlobe - Senior DevOps Engineer

NewGlobe

Lagos, Lagos, Nigeria (On-Site)
10 Hours ago
Every matrix - Application Security Engineer

Every matrix

Bucharest, Bucharest, Romania (Hybrid)
3 Months ago
GoDaddy - Senior Site Reliability Engineer - Linux

GoDaddy

(Remote)
9 Hours ago

Get notifed when new similar jobs are uploaded

Jobs in Worldwide

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!