Understanding the 2024 Cloud Security Landscape
WebSocket vs. Server-Sent Events: Choosing the Best Real-Time Communication Protocol
The Modern DevOps Lifecycle
While DevOps is here to stay, as the years pass, we must continuously assess and seek improvements to our existing software processes, systems, and culture — and DevOps is no exception to that rule. With business needs and customer demands constantly shifting, so must our technology, mindsets, and architecture in order to keep pace.Now is the time for this movement that's all about "shifting left" to essentially shift.In our annual DevOps Trend Report, we explore both its fundamental principles as well as the emerging topics, methodologies, and challenges surrounding the engineering ecosystem. Within our "Key Research Findings" and featured articles from our expert community members, readers will find information on core DevOps topics as well as new insights on what's next for DevOps in 2024 and beyond. Join us to learn about the state of CI/CD pipelines, the impact of technical debt, patterns for supply chain management<>DevOps, the rise of platform engineering, and even more!
Core PostgreSQL
AI Automation Essentials
Artificial intelligence (AI) holds vast potential for societal and industrial transformation. However, ensuring AI systems are safe, fair, inclusive, and trustworthy depends on the quality and integrity of the data upon which they are built. Biased datasets can produce AI models that perpetuate harmful stereotypes, discriminate against specific groups, and yield inaccurate or unreliable results. This article explores the complexities of data bias, outlines practical mitigation strategies, and delves into the importance of building inclusive datasets for the training and testing of AI models [1]. Understanding the Complexities of Data Bias Data plays a key role in the development of AI models. Data bias can infiltrate AI systems in various ways. Here's a breakdown of the primary types of data bias, along with real-world examples [1,2]: Bias Type Description Real-World Examples Selection bias Exclusion or under/over-representation of certain groups * A facial recognition system with poor performance on darker-skinned individuals due to limited diverse representation in the training data. * A survey-based model primarily reflecting urban populations, making it unsuitable for nationwide resource allocation. Information bias Errors, inaccuracies, missing data, or inconsistencies * Outdated census data leading to inaccurate neighborhood predictions. * Incomplete patient history affecting diagnoses made by medical AI. Labeling bias Subjective interpretations and unconscious biases in how data is labeled * Historical bias encoded in image labeling, leading to harmful misclassifications. * Subjective evaluation criteria in a credit risk model, unintentionally disadvantaging certain socioeconomic groups. Societal bias Reflects existing inequalities, discriminatory trends, and stereotypes in data * Word embeddings encoding gender biases from historical text data. * AI loan approval systems inadvertently perpetuating past discriminatory lending practices. Consequences of Data Bias Biased AI models can have far-reaching implications: Discrimination: AI systems may discriminate based on protected attributes such as race, gender, age, or sexual orientation. Perpetuation of stereotypes: Biased models can reinforce and amplify harmful societal stereotypes, further entrenching them within decision-making systems. Inaccurate or unreliable results: AI models built on biased data may produce significantly poorer or unfair results for specific groups or contexts, diminishing their utility, value, and trustworthiness. Erosion of trust: The discovery of bias in AI models can damage public trust, delaying beneficial technology adoption. Strategies for Combating Bias Building equitable AI requires a multi-pronged approach involving tools, planning, transparency, and human oversight: Bias mitigation tools: Frameworks like IBM AI Fairness 360 offer algorithms and metrics to identify and reduce bias throughout the AI development lifecycle. Fairness thresholds: Techniques, such as statistical parity or equal opportunity, establish quantitative fairness goals. Data augmentation: Oversampling techniques and synthetic data generation can help address the underrepresentation of specific groups, improving model performance. Data Management Plans (DMPs): A comprehensive DMP ensures data integrity and outlines collection, storage, security, and sharing protocols. Datasheets: Detailed documentation of dataset characteristics, limitations, and intended uses promotes transparency and aids in informed decision-making [3]. Human-in-the-loop: AI models should be complemented by human oversight and validation to ensure safe, ethical outcomes and also maintain accountability. Advanced techniques: For complex scenarios, explore re-weighting, re-sampling, adversarial learning, counterfactual analysis, and causal modeling for bias reduction. Guidance on Data Management Plans (DMPs) While a data management plan may sound like a simple document. A well-developed data management plan can make a huge difference in reducing bias and safe AI development Ethical considerations: DMPs should explicitly address privacy, informed consent, potential bias sources, and the potential for disproportionate impact. Data provenance: Document origin, transformations, and ownership to ensure auditability over time. Version control: Maintain clear versioning systems for datasets to enable reproducibility and track changes. Evolving Datasheets for Transparency Knowing how and what was used to train the AI models can make it easier to evaluate and also address claims. Datasheets in this case play a major role as they help provide the following Motivational transparency: Articulate the dataset's creation purpose, intended uses, and known limitations [3]. Detailed composition: Provide statistical breakdowns of data features, correlations, and potential anomalies [3]. Comprehensive collection process: Describe sampling methods, equipment, sources of error, and biases introduced at this stage. Preprocessing: Document cleaning, transformation steps, and anonymization techniques. Uses and limitations: Explicitly outline suitable applications and scenarios where ethical concerns or bias limitations are present [3]. AI Fairness Is a Journey Achieving Safe AI is an ongoing endeavor. Regular audits, external feedback mechanisms, and a commitment to continual improvement, in response to evolving societal norms, are vital for building trustworthy and equitable AI systems. References 1. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453. 2. Rajkomar, A., Hardt, M., Howell, M. D., Corrado, G., & Chin, M. H. (2018). Ensuring fairness in machine learning to advance health equity. Annals of Internal Medicine, 169(12), 866-872. 3. Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). Model cards for model reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency, 220-229.
Are you ready to get started with cloud-native observability with telemetry pipelines? This article is part of a series exploring a workshop guiding you through the open source project Fluent Bit, what it is, a basic installation, and setting up the first telemetry pipeline project. Learn how to manage your cloud-native data from source to destination using the telemetry pipeline phases covering collection, aggregation, transformation, and forwarding from any source to any destination. Since Chronosphere acquired the capabilities for integrating telemetry pipelines, I've been digging into how this works, the use cases it solves, and having a lot of fun with the basis, CNCF project Fluent Bit. This workshop is the result of my sharing how to get started with telemetry pipelines and all that you can do with Fluent Bit. This first article in the series provides an introduction to Fluent Bit where we gain an understanding of its role in the cloud native observability world. You can find more details in the accompanying workshop lab. Before we get started, let's get a baseline for defining cloud-native observability pipelines. As noted in a recent trend report: Observability pipelines are providing real-time filtering, enrichment, normalization and routing of telemetry data. The rise in the amount of data being generated in cloud-native environments has become such a burden for teams trying to manage it all, as well as a burden to organization's budgets. They are searching for more control over all this telemetry data, from collecting, processing, and routing, to storing and querying. Data pipelines have gained traction in helping organizations deal with the challenges they are facing by providing a powerful way to lower ingestion volumes and help reduce data costs. One of the benefits is that telemetry pipelines act as a telemetry gateway between cloud-native data and organizations. They perform real-time filtering, enrichment, normalization, and routing to cheap storage. This reduces dependencies on expensive and often proprietary storage solutions. Another plus for organizations is the ability to reformat collected data on the fly, often bridging the gap between legacy or non-standards-based data structures to current standards. They can achieve this without having to update code, re-instrument, or redeploy existing applications and services. Telemetry Pipelines This workshop focuses solely on Fluent Bit as the open-source telemetry pipeline project. From the project documentation, Fluent Bit is an open-source telemetry agent specifically designed to efficiently handle the challenges of collecting and processing telemetry data across a wide range of environments, from constrained systems to complex cloud infrastructures. It's effective at managing telemetry data from various sources and formats can be a constant challenge, particularly when performance is a critical factor. While the term "observability pipelines" is thrown about to cover all kinds of general pipeline activities, the focus in this workshop will be more on telemetry pipelines. This is due to our focus on getting all different types of telemetry from their origins to the destinations we desire, and as noted in the previously referenced trend report: Telemetry pipelines provide real-time filtering, enrichment, normalization, and routing of telemetry data. Rather than serving as a drop-in replacement, Fluent Bit enhances the observability strategy for your infrastructure by adapting and optimizing your existing logging layer, as well as metrics and trace processing. Furthermore, Fluent Bit supports a vendor-neutral approach, seamlessly integrating with other ecosystems such as Prometheus and OpenTelemetry. Fluent Bit can be deployed as an edge agent for localized telemetry data handling or utilized as a central aggregator or collector for managing telemetry data across multiple sources and environments. Fluent Bit has been designed for performance and low resource consumption. As a telemetry pipeline, Fluent Bit is designed to process logs, metrics, and traces at speed, scale, and with flexibility. What About Fluentd? First, there was Fluentd, a CNCF graduated project. It's an open-source data collector for building the unified logging layer. When installed, it runs in the background to collect, parse, transform, analyze, and store various types of data. Fluent Bit is a sub-project within the Fluentd ecosystem. It's considered a lightweight data forwarder for Fluentd. Fluent Bit is specifically designed for forwarding the data from the edge to Fluentd aggregators. Both projects share similarities. Fluent Bit is fully designed and built on top of the best ideas of Fluentd architecture and general design: Understanding the Concepts Before we dive into using Fluent Bit, it's important to have an understanding of the key concepts, so let's explore the following: Event or Record: Each incoming piece of data is considered an event or a record. Filtering: The process of altering, enriching, or dropping an event Tag: An internal string used by the router in later stages of our pipeline to determine which filters or output phases an event must pass through Timestamp: Assigned to each event as it enters a pipeline and is always present Match: Represents a rule applied to events where it examines its tags for matches Structured Message: The goal is to ensure that all messages have a structured format, defined as having keys and values. Pipeline Phases A telemetry pipeline is where data goes through various phases from collection to final destination. We can define or configure each phase to manipulate the data or the path it's taking through our telemetry pipeline. The first phase is INPUT, which is where Fluent Bit uses input plugins to gather information from specific sources. When an input plugin is loaded, it creates an instance that we can configure using the plugin's properties. The second phase is PARSER, which is where unstructured input data is turned into structured data. Fluent Bit does this using parsers that we can configure to manipulate the unstructured data producing structured data for the next phases of our pipeline. The FILTER phase is when we modify, enrich, or delete any of the collected events. Fluent Bit provides many out-of-the-box plugins such as filters that can match, exclude, or enrich your structured data before it moves onward in the pipeline. Filters can be configured using the provided properties. The BUFFER phase is where the data is stored, using in-memory or file system-based options. Note that when data reaches the buffer phase it's in an immutable state (no more filtering) and that buffered data is not raw text, but in an internal binary representation for storage. The next phase is ROUTING, which is where Fluent Bit uses the previously discussed tag and match concepts to determine which output destinations to send data to. During the input phase, data is assigned a tag; during the routing phase, data is compared to match rules from output configurations. If it matches, then the data is sent to that output destination. The final phase is OUTPUT, which is where Fluent Bit uses output plugins to connect with specific destinations. These destinations can be databases, remote services, cloud services, and more. When an input plugin is loaded, it creates an instance that we can configure using the plugin's properties. For code examples for these phases and more details about telemetry pipeline phases, see the workshop lab. What's Next? This article was an introduction to telemetry pipelines and Fluent Bit. This series continues with the next step in this workshop: installing Fluent Bit on your local machine from the source or using container images. Stay tuned for more hands-on material to help you with your cloud-native observability journey.
Effective exception management is pivotal for maintaining the integrity and stability of software applications. Java's lambda expressions offer a concise means of expressing anonymous functions, yet handling exceptions within these constructs presents unique challenges. In this article, we'll delve into the nuances of managing exceptions within Java lambda expressions, exploring potential hurdles and providing practical strategies to overcome them. Understanding Lambda Expressions in Java Java 8 introduced lambda expressions, revolutionizing the way we encapsulate functionality as method arguments or create anonymous classes. Lambda expressions comprise parameters, an arrow (->), and a body, facilitating a more succinct representation of code blocks. Typically, lambda expressions are utilized with functional interfaces, which define a single abstract method (SAM). Java // Syntax of a Lambda Expression (parameter_list) -> { lambda_body } Exception Handling in Lambda Expressions Lambda expressions are commonly associated with functional interfaces, most of which do not declare checked exceptions in their abstract methods. Consequently, dealing with operations that might throw checked exceptions within lambda bodies presents a conundrum. Consider the following example: Java interface MyFunction { void operate(int num); } public class Main { public static void main(String[] args) { MyFunction func = (num) -> { System.out.println(10 / num); }; func.operate(0); // Division by zero } } In this scenario, dividing by zero triggers an ArithmeticException. As the operate method in the MyFunction interface doesn't declare any checked exceptions, handling the exception directly within the lambda body is disallowed by the compiler. Workarounds for Exception Handling in Lambda Expressions Leveraging Functional Interfaces With Checked Exceptions One workaround involves defining functional interfaces that explicitly declare checked exceptions in their abstract methods. Java @FunctionalInterface interface MyFunctionWithException { void operate(int num) throws Exception; } public class Main { public static void main(String[] args) { MyFunctionWithException func = (num) -> { if (num == 0) { throw new Exception("Division by zero"); } System.out.println(10 / num); }; try { func.operate(0); } catch (Exception e) { e.printStackTrace(); } } } Here, the MyFunctionWithException functional interface indicates that the operate method may throw an Exception, enabling external handling of the exception. Utilizing Try-Catch Within Lambda Body Another approach involves enclosing the lambda body within a try-catch block to manage exceptions internally. Java interface MyFunction { void operate(int num); } public class Main { public static void main(String[] args) { MyFunction func = (num) -> { try { System.out.println(10 / num); } catch (ArithmeticException e) { System.out.println("Cannot divide by zero"); } }; func.operate(0); } } This method maintains the brevity of the lambda expression while encapsulating exception-handling logic within the lambda body itself. Employing Optional for Exception Handling Java 8 introduced the Optional class, providing a mechanism to wrap potentially absent values. This feature can be harnessed for exception handling within lambda expressions. Java import java.util.Optional; interface MyFunction { void operate(int num); } public class Main { public static void main(String[] args) { MyFunction func = (num) -> { Optional<Integer> result = divideSafely(10, num); result.ifPresentOrElse( System.out::println, () -> System.out.println("Cannot divide by zero") ); }; func.operate(0); } private static Optional<Integer> divideSafely(int dividend, int divisor) { try { return Optional.of(dividend / divisor); } catch (ArithmeticException e) { return Optional.empty(); } } } In this example, the divideSafely() helper method encapsulates the division operation within a try-catch block. If successful, it returns an Optional containing the result; otherwise, it returns an empty Optional. The ifPresentOrElse() method within the lambda expression facilitates handling both successful and exceptional scenarios. Incorporating multiple Optional instances within exception-handling scenarios can further enhance the robustness of Java lambda expressions. Let's consider an example where we have two values that we need to divide, and both operations are wrapped within Optional instances for error handling: Java import java.util.Optional; interface MyFunction { void operate(int num1, int num2); } public class Main { public static void main(String[] args) { MyFunction func = (num1, num2) -> { Optional<Integer> result1 = divideSafely(10, num1); Optional<Integer> result2 = divideSafely(20, num2); result1.ifPresentOrElse( res1 -> result2.ifPresentOrElse( res2 -> System.out.println("Result of division: " + (res1 / res2)), () -> System.out.println("Cannot divide second number by zero") ), () -> System.out.println("Cannot divide first number by zero") ); }; func.operate(0, 5); } private static Optional<Integer> divideSafely(int dividend, int divisor) { try { return Optional.of(dividend / divisor); } catch (ArithmeticException e) { return Optional.empty(); } } } In this example, the operate method within the Main class takes two integer parameters num1 and num2. Inside the lambda expression assigned to func, we have two division operations, each wrapped within its respective Optional instance: result1 and result2. We use nested ifPresentOrElse calls to handle both present (successful) and absent (exceptional) cases for each division operation. If both results are present, we perform the division operation and print the result. If either of the results is absent (due to division by zero), an appropriate error message is printed. This example demonstrates how multiple Optional instances can be effectively utilized within Java lambda expressions to handle exceptions and ensure the reliability of operations involving multiple values. Chained Operations With Exception Handling Suppose we have a chain of operations where each operation depends on the result of the previous one. We want to handle exceptions gracefully within each step of the chain. Here's how we can achieve this: Java import java.util.Optional; public class Main { public static void main(String[] args) { // Chain of operations: divide by 2, then add 10, then divide by 5 process(20, num -> divideSafely(num, 2)) .flatMap(result -> process(result, res -> addSafely(res, 10))) .flatMap(result -> process(result, res -> divideSafely(res, 5))) .ifPresentOrElse( System.out::println, () -> System.out.println("Error occurred in processing") ); } private static Optional<Integer> divideSafely(int dividend, int divisor) { try { return Optional.of(dividend / divisor); } catch (ArithmeticException e) { return Optional.empty(); } } private static Optional<Integer> addSafely(int num1, int num2) { // Simulating a possible checked exception scenario if (num1 == 0) { return Optional.empty(); } return Optional.of(num1 + num2); } private static Optional<Integer> process(int value, MyFunction function) { try { function.operate(value); return Optional.of(value); } catch (Exception e) { return Optional.empty(); } } interface MyFunction { void operate(int num) throws Exception; } } In this illustration, the function process accepts an integer and a lambda function (named MyFunction). It executes the operation specified by the lambda function and returns a result wrapped in an Optional. We link numerous process calls together, where each relies on the outcome of the preceding one. The flatMap function is employed to manage potential empty Optional values and prevent the nesting of Optional instances. If any step within the sequence faces an error, the error message is displayed. Asynchronous Exception Handling Imagine a scenario where we need to perform operations asynchronously within lambda expressions and handle any exceptions that occur during execution: Java import java.util.concurrent.CompletableFuture; public class Main { public static void main(String[] args) { CompletableFuture.supplyAsync(() -> divideAsync(10, 2)) .thenApplyAsync(result -> addAsync(result, 5)) .thenApplyAsync(result -> divideAsync(result, 0)) .exceptionally(ex -> { System.out.println("Error occurred: " + ex.getMessage()); return null; // Handle exception gracefully }) .thenAccept(System.out::println); // Print final result } private static int divideAsync(int dividend, int divisor) { return dividend / divisor; } private static int addAsync(int num1, int num2) { return num1 + num2; } } In this example, we use CompletableFuture to perform asynchronous operations. Each step in the chain (supplyAsync, thenApplyAsync) represents an asynchronous task, and we chain them together. The exceptionally method allows us to handle any exceptions that occur during the execution of the asynchronous tasks. If an exception occurs, the error message is printed, and the subsequent steps in the chain are skipped. Finally, the result of the entire operation is printed. Conclusion Navigating exception handling in the context of Java lambdas requires innovative approaches to preserve the succinctness and clarity of lambda expressions. Strategies such as exception wrapping, custom interfaces, the "try" pattern, and using external libraries offer flexible solutions. Whether it's through leveraging functional interfaces with checked exceptions, encapsulating exception handling within try-catch blocks inside lambda bodies, or utilizing constructs like Optional, mastering exception handling in lambda expressions is essential for building resilient Java applications. Essentially, while lambda expressions streamline code expression, implementing effective exception-handling techniques is crucial to fortify the resilience and dependability of Java applications against unforeseen errors. With the approaches discussed in this article, developers can confidently navigate exception management within lambda expressions, thereby strengthening the overall integrity of their codebases.
Have you ever wondered how data flows within a software system? How is information processed and transformed, and how does it deliver value? Data Flow Diagrams (DFDs) are a "visual language" that may answer such questions. An important tool for understanding how data moves in a software system, DFDs provide a visual representation of the flow of data from its entry point to its final destination and highlight data transformations along the way. Whether you're a tester, a seasoned developer, a budding programmer, or a stakeholder involved in system design and architecture, understanding DFDs unlocks a valuable skillset. This article provides fundamental knowledge about DFDs, highlighting their benefits and guiding you on how to leverage them effectively. We start with DFD basics and a set of steps on how to create a DFD. An inventory management system is used as an example, where we design a sample of basic test cases based on DFDs. The benefits and limitations of DFDs are also explored. We finish by providing tools available for DFDs, highlighting their strengths and weaknesses. The Orchestra A useful metaphor is that of the orchestra, where data flows like musical notes played by different instruments. DFDs act as the conductor's score, outlining the movement and transformation of these notes. Here's how the orchestra translates to DFD components: Musicians (data sources): These are the violins, flutes, and other instruments that provide the initial musical data (e.g., customer records from a database, sensor readings from a device). Audience (data destinations): The audience represents the final recipients of the music, just like reports generated for management or data sent to another system for further processing. Sheet music stands (data stores): The music stands holding the sheet music – these are like databases, files, or in-memory buffers that temporarily or permanently store data. Musical phrases (data flows): The flowing melodies and harmonies translate to data flows, depicted as arrows connecting different components. The conductor (data processes): Just as the conductor guides the musicians and shapes the music, data processes represent transformations happening to the data as it flows (e.g., calculations, filtering, data manipulation). By analyzing the score (DFD), we can understand the complete musical journey. Similarly, understanding data flow helps us comprehend how information moves and evolves within a system. How To Create a DFD In short, creating a DFD is an iterative process that could benefit from pairing activities and feedback from others. We start by defining scope, entities, and other parameters. We are done when we can answer some basic questions as mentioned below. 1. Define the System Scope The first step is to clearly define the boundaries of the system you're modeling. What are the system's functionalities? What are the external entities it interacts with? Having a well-defined scope ensures your DFD focuses on the relevant data flows within the system. 2. Identify External Entities List all the external entities that interact with the system. Are there users entering data through a web interface? Does the system receive data from another system via an API? Each entity should be named clearly, reflecting its role in data exchange. 3. Pinpoint Data Flows For each external entity, identify the data it sends to the system (inputs) and the data it receives from the system (outputs). Label the data flows with descriptive names that capture the essence of the data being transferred. 4. Introduce Data Stores Identify the data repositories within the system. What kind of data does the system store? Does it maintain a database of customer information or a log file of system activity? Each data store should be named appropriately, reflecting the data it holds. 5. Outline Data Processes Define the transformations that occur on the data as it flows through the system. How is the customer order data processed to generate an invoice? How is the sensor data filtered before analysis? Each data process should be labeled with a clear description of its function. 6. Diagramming the DFD Once you have identified all the elements, it's time to visually represent them using a DFD tool or even a simple drawing tool. Use standard symbols for external entities (rectangles), data flows (arrows), data stores (cylinders), and data processes (rounded rectangles). 7. Leveling Up: Context and DFD Levels A single DFD might not capture the entire system's complexity. Here's where the concept of DFD levels comes in: Context Diagram (Level 0): This high-level overview depicts the system as a single process interacting with external entities. Level 1 DFD: This level decomposes the single process from the context diagram into more detailed sub-processes, showcasing data flows and data stores within the system. Level 2 DFD (and beyond): Further decomposition can occur, focusing on specific sub-processes from Level 1 DFDs, providing even greater detail on data flow within those sub-processes. 8. Refining and Validating Once your DFD is drafted, review it for accuracy and completeness. Do the data flows connect to the correct entities and processes? Are the data transformations within processes clearly defined? Seek feedback from stakeholders to ensure the DFD accurately reflects the system's intended behavior. How to Level Up To help grasp the main idea here, another metaphor could help. Each city has neighborhoods, streets, and hidden alleyways. A single map might not capture every detail. Similarly, a single DFD might not encapsulate the detailed data flow within a complex system. This is where DFD levels come into play, offering a hierarchical approach to visualize data flow at different granularities. 1. Context Diagram (Level 0) The big picture: This is a city map from a helicopter. It depicts the entire system as a single, high-level process. This process interacts with various external entities, represented by rectangles. Data flows (arrows) showcase the exchange of information between the system and these entities. Focus: The context diagram provides a broad overview, highlighting the system's purpose and its interaction with the external world. It's ideal for high-level discussions and initial system understanding. 2. Level 1 DFD Delving deeper: Now we descend into the city! This level decomposes the single process from the context diagram into more detailed sub-processes. These sub-processes, represented by rounded rectangles, showcase the internal workings of the system. Data flow and stores: Level 1 DFDs could depict data flows (arrows) connecting these sub-processes. They also introduce data stores (cylinders) representing the system's internal repositories where data is temporarily or permanently held (e.g., databases, files). Increased detail: This level offers a more granular view of how data flows within the system. It is a more revealing level of the functionalities performed by each sub-process and how they interact with data stores. 3. Level 2 DFD (and Beyond) Zooming in on specific areas: Here, we are exploring a specific neighborhood within the city. Level 2 DFDs (and potentially even further levels) take a sub-process from the Level 1 DFD and break it down even further. They focus on the data flow within that specific sub-process. Greater clarity for complex functions: This level is particularly useful for complex functionalities within the system. By decomposing them into smaller, more manageable components, DFDs provide a clearer understanding of how data is manipulated and transformed within each sub-process. Choosing the Right Level The appropriate DFD level depends on the system's complexity and the level of detail required. Context diagram: Ideal for initial system understanding and high-level communication Level 1 DFD: Provides a good balance between overview and detail, useful for design and development discussions Level 2 DFD (and beyond): Focuses on specific functionalities, helpful for in-depth analysis and documentation By leveraging DFD levels, you can create a comprehensive set of diagrams that effectively capture the data flow within a system, from a high-level overview to a granular examination of specific processes. This layered approach ensures clear communication and understanding for all stakeholders involved in system design and development. Inventory Management System Let's assume that our software manages the inventory for a small store. Here's a simplified breakdown of its functionalities: User adds new items: The user interacts with the system to add new items to the inventory. This involves providing details like item name, description, quantity, and price. Inventory data validation: The system validates the entered data, ensuring required fields are filled and data formats are correct (e.g., positive quantities, valid price format). Inventory update: If validation passes, the new item is added to the inventory database, or an existing item's quantity is updated. Item search: The user can search for items in the inventory by name or other criteria. Inventory report: The user can generate a report summarizing the current inventory status, including item names, quantities, and total values. A context diagram for the inventory management system could simply depict that a user interacts with the system by exchanging inventory data, as shown below. A Level 1 DFD may decompose the context diagram, as shown below. It focuses on core functionalities. You can extend it to include additional data flows, such as user authentication or managing low-stock alerts, among others. A Level 2 DFD may focus and expand on any item from the Level 1 DFD. As an example, we will create a Level 2 DFD by focusing on the "Add New Item" functionality from the Level 1 DFD. Focus is given on data validation for new items. You can extend it to include data processing for adding the item to the database, for example. The specific validation rules can be further customized based on your system's requirements (e.g., validating price range or description length). Level 2 DFDs can be created for other functionalities like "Item Search" or "Inventory Report" to provide a more granular view of data flow within those processes. Test Case Design Based on DFD Now, let's leverage the DFD to design test cases that cover various scenarios and potential data flows: 1. User Interface Testing Test Case 1.1 Enter valid item details (name, description, quantity > 0, price > 0). Expected result: Item is successfully added to the inventory. Test Case 1.2 Leave a field blank (e.g., no item name). Expected result: The system displays an error message prompting the user to fill in the required field. Test Case 1.3 Enter an invalid quantity (e.g., negative number). Expected result: The system displays an error message indicating an invalid quantity format. Test Case 1.4 Enter an invalid price format (e.g., letters instead of numbers). Expected result: The system displays an error message indicating an invalid price format. 2. Inventory Data Validation Testing Test Case 2.1 Enter a duplicate item name for a new item. Expected result: The system displays an error message indicating that the item already exists. Test Case 2.2 Enter a very long item name (exceeding a defined limit). Expected result: The system displays an error message indicating that the item name is too long. 3. Inventory Update Testing Test Case 3.1 Add a new item with a valid quantity. Expected result: The item is added to the inventory database with the correct quantity. Test Case 3.2 Update the quantity of an existing item. Expected result: The inventory database is updated with the new quantity for the item. Test Case 3.3 Attempt to update the quantity of a non-existent item. Expected result: The system displays an error message indicating that the item cannot be found. 4. Item Search Testing Test Case 4.1 Search for an item by its exact name (case-sensitive). Expected result: The system accurately retrieves the item information. Test Case 4.2 Search for an item by a partial name match (case-insensitive). Expected result: The system retrieves all items that match the partial name (if applicable). Test Case 4.3 Search for an item that doesn't exist in the inventory. Expected result: The system displays a message indicating that no items match the search criteria. 5. Inventory Report Testing Test Case 5.1 Generate a report when the inventory is empty. Expected result: The report displays a message indicating no items in the inventory. Test Case 5.2 Generate a report with various items in the inventory. Expected result: The report accurately lists all items with their names, quantities, and calculated total values. Test Case 5.3 Generate a report in a specific format (e.g., CSV, PDF). Expected result: The report is generated in the requested format with correct data representation. Remember This is a simplified example, and the specific test cases will vary depending on your functionality. DFDs are a valuable tool for identifying key data flows and system components. By analyzing these flows, you can design test cases to ensure the system functions as expected. Benefits of Data Flow Diagrams Creating DFDs offers a multitude of benefits for software development projects: Enhanced Communication DFDs provide a clear "visual language" that stakeholders, both technical and non-technical, can understand. This can improve communication and collaboration during the design phase. Improved System Design By visualizing data flow, potential bottlenecks or inefficiencies in data processing can be identified early on. This allows for a more optimized system design. Clearer Data Requirements DFDs highlight the data required by the system to function effectively. This can help to define data storage needs and design appropriate database structures. Documentation and Maintenance DFDs serve as valuable documentation throughout the development lifecycle. They provide a reference point for project managers, architects, developers, testers, and future maintainers, ensuring a clear understanding of the system's data flow. Early Error Detection By visualizing data flow, potential inconsistencies or missing data transformations can be identified before coding commences. This leads to fewer errors during development and reduces the need for costly rework later in the project. Limitations of Data Flow Diagrams Here are some of the limitations of DFDs: Limited Control Flow Representation DFDs primarily focus on data flow and transformations. They don't explicitly depict the order or decision logic within processes. While some tools might offer symbols for conditional flows, DFDs aren't ideal for representing complex control flow logic. This can be crucial for understanding how a system behaves under different conditions. Data Structure Complexity DFDs represent data flows with simple labels, which might not adequately capture the structure and complexity of data. For systems dealing with complex data objects or hierarchical relationships, this may be an issue. Scalability for Large Systems For very large and complex systems, creating and managing a single, comprehensive DFD can become cumbersome. The sheer number of elements and data flows can make the diagram difficult to understand and maintain. Focus on Functionality, Not Implementation DFDs primarily depict the "what" of a system, focusing on functionalities and data flow. They don't directly translate to specific code or implementation details. Additional documentation might be needed to bridge the gap between the DFD and the actual system implementation. Limited User Interaction Modeling While DFDs can represent basic user interactions with the system as external entities, they don't depict the user interface or user experience (UX) aspects in detail. Additional tools or diagrams might be required to capture these aspects effectively. In spite of these limitations, DFDs remain a valuable tool for system design and communication. Here are some strategies to mitigate these limitations: Use swimlane diagrams for complex control flow: These diagrams can visually represent decision points and alternative paths within a process. Create separate DFDs for different functionalities: Breaking down a large system into smaller, more manageable DFDs can improve readability and maintainability. Combine DFDs with other tools: Use DFDs in conjunction with other diagrams like structure charts or user flow diagrams to provide a more comprehensive picture of the system. Document data structures separately: Create additional documentation that details the structure and relationships within your data objects. By understanding these limitations and employing appropriate strategies, you can leverage DFDs effectively to design, document, and communicate system functionalities. Tools for Data Flow Diagrams Here are the most popular tools for creating DFDs, along with a comparison of their key features: 1. Lucidchart Strengths Cloud-based: Accessible from any device with a web browser, good for remote collaboration User-friendly: Drag-and-drop functionality simplifies DFD creation Real-time collaboration: Enables multiple users to work on the same DFD simultaneously Integration: Connects seamlessly with various project management and development tools Rich template library: Offers a comprehensive library of DFD symbols and templates Weaknesses Cost: The free plan has limited features and storage space. Advanced features require paid subscriptions. 2. Microsoft Visio Strengths Industry-standard: Widely recognized and used across various industries Extensive library: Provides a vast collection of DFD symbols and templates for detailed diagrams Customization options: Allows extensive customization of shapes, lines, and styles Integration: Offers strong integration with other Microsoft Office products Weaknesses Cost: Can be expensive, especially for single users Learning curve: Steeper learning curve compared to simpler tools Overkill for basic DFDs: There might be more than needed for creating simple DFDs. 3. Draw.io (Formerly Gliffy) Strengths Free and open-source: No licensing costs and accessible to everyone Cross-platform: Available as a web-based interface and a desktop app Large symbol library: Offers a wide range of shapes and templates, including DFD symbols Export options: Allows exporting diagrams in various image formats (PNG, JPG) and SVG for further editing Weaknesses Limited collaboration: Collaboration features are less robust compared to some paid tools. Fewer advanced features: Lacks some advanced features like shape customization or data import/export 4. yEd Graph Editor Strengths Free and open-source: Another free option for DFD creation Flexibility: Offers flexibility for creating custom shapes and layouts Data import/export: Supports importing/exporting data in various formats, useful for complex diagrams Weaknesses Learning curve: The user interface might be less intuitive compared to drag-and-drop tools. Collaboration: Lacks some of the collaborative features of cloud-based tools 5. Microsoft Word Strengths Readily available: Most users already have access to Microsoft Word, making it a convenient option. Basic functionalities: Offers basic shape drawing capabilities and limited DFD symbol options Documentation: This can be sufficient for creating simple DFDs for documentation purposes within Word documents. Weaknesses Limited capabilities: Not a dedicated diagramming tool, making it cumbersome for complex DFDs Missing features: Lacks advanced features like shape customization, layout options, and robust collaboration Choosing the Right Tool The best tool for you depends on your specific needs. Here's a quick guide: For simple DFDs with limited collaboration needs: Draw.io or Word might be sufficient. For complex DFDs and collaboration: Lucidchart or Visio are good choices with advanced features. For budget-conscious users: Draw.io and yEd Graph Editor are free alternatives. For Microsoft Office users who need basic DFD creation: Microsoft Word can suffice for simple diagrams. Consider the factors mentioned above and try these tools to see which one best suits your workflow and project requirements. Wrapping Up Data Flow Diagrams are a cornerstone of effective software development. By mastering DFD creation, you gain a powerful tool for understanding, visualizing, and documenting the flow of data within a system. This empowers you to design efficient data processing workflows, identify potential issues early on, and communicate clearly with stakeholders throughout the development process.
Extra panel in the link: https://turnoff.us/geek/too-many-indexes/#extra_panel
Angular, a powerful framework for building dynamic web applications, is known for its component-based architecture. However, one aspect that often puzzles new developers is the fact that Angular components do not have a display: block style by default. This article explores the implications of this design choice, its impact on web development, and how developers can effectively work with it. The world of front-end development is replete with frameworks that aim to provide developers with robust tools to build interactive and dynamic web applications. Among these, Angular stands out as a powerful platform, known for its comprehensive approach to constructing applications’ architecture. Particularly noteworthy is the way Angular handles components — the fundamental building blocks of Angular applications. Understanding Angular Components In Angular, components are the fundamental building blocks that encapsulate data binding, logic, and template rendering. They play a crucial role in defining the structure and behavior of your application’s interface. Definition and Role A component in Angular is a TypeScript class decorated with @Component(), where you can define its application logic. Accompanying this class is a template, typically an HTML file, that determines the component's visual representation, and optionally CSS files for styling. The component's role is multifaceted: it manages the data and state necessary for the view, handles user interactions, and can also be reusable throughout the application. TypeScript import { Component } from '@angular/core'; @Component({ selector: 'app-my-component', templateUrl: './my-component.component.html', styleUrls: ['./my-component.component.css'] }) export class MyComponent { // Component logic goes here } Angular’s Shadow DOM Angular components utilize a feature known as Shadow DOM, which encapsulates their markup and styles, ensuring that they’re independent of other components. This means that styles defined in one component will not leak out and affect other parts of the application. Shadow DOM allows for style encapsulation by creating a boundary around the component. As a developer, it’s essential to understand the structure and capabilities of Angular components to fully leverage the power of the framework. Recognizing the inherent encapsulation provided by Angular’s Shadow DOM is particularly important when considering how components are displayed and styled within an application. Display Block: The Non-Default in Angular Components Angular components are different from standard HTML elements in many ways, one of which is their default display property. Unlike basic HTML elements, which often come with a display value of block or inline, Angular components are assigned none as their default display behavior. This decision is intentional and plays an important role in Angular’s encapsulation philosophy and component rendering process. Comparison With HTML Elements Standard HTML elements like <div>, <p>, and <h1> come with a default styling that can include the CSS display: block property. This means that when you drop a <div> into your markup, it naturally takes up the full width available to it, creating a "block" on the page. <!-- Standard HTML div element --> <div>This div is a block-level element by default.</div> In contrast, Angular components start without any assumptions on their display property. That is, they don’t inherently behave as block or inline elements; they are essentially “display-agnostic” until specified. Rationale Behind Non-Block Default Angular’s choice to diverge from the typical block behavior of HTML elements is deliberate. One reason for this is to encourage developers to consciously decide how each component should be displayed within the application’s layout. It prevents unexpected layout shifts and the overwriting of global styles that may occur when components with block-level styles are introduced into existing content. By not having a display property set by default, Angular invites developers to think responsively and adapt their components to various screen sizes and layout requirements by setting explicit display styles that suit the component’s purpose within the context of the application. In the following section, we will explore how to work with the display properties of Angular components, ensuring that they fit seamlessly into your application’s design with explicit and intentional styling choices. Working With Angular’s Display Styling When building applications with Angular, understanding and properly implementing display styling is crucial for achieving the desired layout and responsiveness. Since Angular components come without a preset display rule, it’s up to the developer to define how each component should be displayed within the context of the application. 1. Explicitly Setting Display Styles You have complete control over how the Angular component is displayed by explicitly setting the CSS display property. This can be defined inline, within the component's stylesheet, or even dynamically through component logic. /* app-example.component.css */ :host { display: block; } <!-- Inline style --> <app-example-component style="display: block;"></app-example-component> // Component logic setting display dynamically export class ExampleComponent implements OnInit { @HostBinding('style.display') displayStyle: string = 'block'; } Choosing to set your component’s display style via the stylesheet ensures that you can leverage CSS’s full power, including media queries for responsiveness. 2. Responsive Design Considerations Angular’s adaptability allows you to create responsive designs by combining explicit display styles with modern CSS techniques. Using media queries, flexbox, and CSS Grid, you can responsively adjust the layout of your components based on the viewport size. CSS /* app-example.component.css */ :host { display: grid; grid-template-columns: repeat(auto-fill, minmax(150px, 1fr)); } @media (max-width: 768px) { :host { display: block; } } By setting explicit display values in style sheets and using Angular’s data-binding features, you can create a responsive and adaptive user interface. This level of control over styling reflects the thoughtful consideration that Angular brings to the development process, enabling you to create sophisticated, maintainable, and scalable applications. Next, we will wrap up our discussion and revisit the key takeaways from working with Angular components and their display styling strategies. Conclusion Throughout this exploration of Angular components and their display properties, it’s become apparent that Angular’s choice to use a non-block default for components is a purposeful design decision. This approach promotes a more thoughtful application of styles and supports encapsulation, a core principle within Angular’s architecture. It steers developers toward crafting intentional and adaptive layouts, a necessity in the diverse landscape of devices and screen sizes. By understanding Angular’s component architecture and the reasoning behind its display styling choices, developers are better equipped to make informed decisions. Explicit display settings and responsive design considerations are not afterthoughts but integral parts of the design and development process when working with Angular. Embracing these concepts allows developers to fully leverage the framework’s capabilities, leading to well-structured, maintainable, and responsive applications that stand the test of time and technology evolution. The information provided in this article aims to guide Angular developers to harness these tools effectively, ensuring that the user experiences they create are as robust as the components they comprise.
In Part 1 of this series, we looked at MongoDB, one of the most reliable and robust document-oriented NoSQL databases. Here in Part 2, we'll examine another quite unavoidable NoSQL database: Elasticsearch. More than just a popular and powerful open-source distributed NoSQL database, Elasticsearch is first of all a search and analytics engine. It is built on the top of Apache Lucene, the most famous search engine Java library, and is able to perform real-time search and analysis operations on structured and unstructured data. It is designed to handle efficiently large amounts of data. Once again, we need to disclaim that this short post is by no means an Elasticsearch tutorial. Accordingly, the reader is strongly advised to extensively use the official documentation, as well as the excellent book, "Elasticsearch in Action" by Madhusudhan Konda (Manning, 2023) to learn more about the product's architecture and operations. Here, we're just reimplementing the same use case as previously, but using this time, using Elasticsearch instead of MongoDB. So, here we go! The Domain Model The diagram below shows our *customer-order-product* domain model: This diagram is the same as the one presented in Part 1. Like MongoDB, Elasticsearch is also a document data store and, as such, it expects documents to be presented in JSON notation. The only difference is that to handle its data, Elasticsearch needs to get them indexed. There are several ways that data can be indexed in an Elasticsearch data store; for example, piping them from a relational database, extracting them from a filesystem, streaming them from a real-time source, etc. But whatever the ingestion method might be, it eventually consists of invoking the Elasticsearch RESTful API via a dedicated client. There are two categories of such dedicated clients: REST-based clients like curl, Postman, HTTP modules for Java, JavaScript, Node.js, etc. Programming language SDKs (Software Development Kit): Elasticsearch provides SDKs for all the most used programming languages, including but not limited to Java, Python, etc. Indexing a new document with Elasticsearch means creating it using a POST request against a special RESTful API endpoint named _doc. For example, the following request will create a new Elasticsearch index and store a new customer instance in it. Plain Text POST customers/_doc/ { "id": 10, "firstName": "John", "lastName": "Doe", "email": { "address": "john.doe@gmail.com", "personal": "John Doe", "encodedPersonal": "John Doe", "type": "personal", "simple": true, "group": true }, "addresses": [ { "street": "75, rue Véronique Coulon", "city": "Coste", "country": "France" }, { "street": "Wulfweg 827", "city": "Bautzen", "country": "Germany" } ] } Running the request above using curl or the Kibana console (as we'll see later) will produce the following result: Plain Text { "_index": "customers", "_id": "ZEQsJI4BbwDzNcFB0ubC", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 1, "_primary_term": 1 } This is the Elasticsearch standard response to a POST request. It confirms having created the index named customers, having a new customer document, identified by an automatically generated ID ( in this case, ZEQsJI4BbwDzNcFB0ubC). Other interesting parameters appear here, like _version and especially _shards. Without going into too much detail, Elasticsearch creates indexes as logical collections of documents. Just like keeping paper documents in a filing cabinet, Elasticsearch keeps documents in an index. Each index is composed of shards, which are physical instances of Apache Lucene, the engine behind the scenes responsible for getting the data in or out of the storage. They might be either primary, storing documents, or replicas, storing, as the name suggests, copies of primary shards. More on that in the Elasticsearch documentation - for now, we need to notice that our index named customers is composed of two shards: of which one, of course, is primary. A final notice: the POST request above doesn't mention the ID value as it is automatically generated. While this is probably the most common use case, we could have provided our own ID value. In each case, the HTTP request to be used isn't POST anymore, but PUT. To come back to our domain model diagram, as you can see, its central document is Order, stored in a dedicated collection named Orders. An Order is an aggregate of OrderItem documents, each of which points to its associated Product. An Order document references also the Customer who placed it. In Java, this is implemented as follows: Java public class Customer { private Long id; private String firstName, lastName; private InternetAddress email; private Set<Address> addresses; ... } The code above shows a fragment of the Customer class. This is a simple POJO (Plain Old Java Object) having properties like the customer's ID, first and last name, email address, and a set of postal addresses. Let's look now at the Order document. Java public class Order { private Long id; private String customerId; private Address shippingAddress; private Address billingAddress; private Set<String> orderItemSet = new HashSet<>() ... } Here you can notice some differences compared to the MongoDB version. As a matter of fact, with MongoDB, we were using a reference to the customer instance associated with this order. This notion of reference doesn't exist with Elasticsearch and, hence, we're using this document ID to create an association between the order and the customer who placed it. The same applies to the orderItemSet property which creates an association between the order and its items.The rest of our domain model is quite similar and based on the same normalization ideas. For example, the OrderItem document: Java public class OrderItem { private String id; private String productId; private BigDecimal price; private int amount; ... } Here, we need to associate the product which makes the object of the current order item. Last but not least, we have the Product document: Java public class Product { private String id; private String name, description; private BigDecimal price; private Map<String, String> attributes = new HashMap<>(); ... } The Data Repositories Quarkus Panache greatly simplifies the data persistence process by supporting both the active record and the repository design patterns. In Part 1, we used the Quarkus Panache extension for MongoDB to implement our data repositories, but there is not yet an equivalent Quarkus Panache extension for Elasticsearch. Accordingly, waiting for a possible future Quarkus extension for Elasticsearch, here we have to manually implement our data repositories using the Elasticsearch dedicated client. Elasticsearch is written in Java and, consequently, it is not a surprise that it offers native support for invoking the Elasticsearch API using the Java client library. This library is based on fluent API builder design patterns and provides both synchronous and asynchronous processing models. It requires Java 8 at minimum. So, what do our fluent API builder-based data repositories look like? Below is an excerpt from the CustomerServiceImpl class which acts as a data repository for the Customer document. Java @ApplicationScoped public class CustomerServiceImpl implements CustomerService { private static final String INDEX = "customers"; @Inject ElasticsearchClient client; @Override public String doIndex(Customer customer) throws IOException { return client.index(IndexRequest.of(ir -> ir.index(INDEX).document(customer))).id(); } ... As we can see, our data repository implementation must be a CDI bean having an application scope. The Elasticsearch Java client is simply injected, thanks to the quarkus-elasticsearch-java-client Quarkus extension. This way avoids lots of bells and whistles that we would have had to use otherwise. The only thing we need to be able to inject the client is to declare the following property: Properties files quarkus.elasticsearch.hosts = elasticsearch:9200 Here, elasticsearch is the DNS (Domain Name Server) name that we associate with the Elastic search database server in the docker-compose.yaml file. 9200 is the TCP port number used by the server to listen for connections.The method doIndex() above creates a new index named customers if it doesn't exist and indexes (stores) into it a new document representing an instance of the class Customer. The indexing process is performed based on an IndexRequest accepting as input arguments the index name and the document body. As for the document ID, it is automatically generated and returned to the caller for further reference.The following method allows to retrieve the customer identified by the ID given as an input argument: Java ... @Override public Customer getCustomer(String id) throws IOException { GetResponse<Customer> getResponse = client.get(GetRequest.of(gr -> gr.index(INDEX).id(id)), Customer.class); return getResponse.found() ? getResponse.source() : null; } ... The principle is the same: using this fluent API builder pattern, we construct a GetRequest instance in a similar way that we did with the IndexRequest, and we run it against the Elasticsearch Java client. The other endpoints of our data repository, allowing us to perform full search operations or to update and delete customers, are designed the same way. Please take some time to look at the code to understand how things are working. The REST API Our MongoDB REST API interface was simple to implement, thanks to the quarkus-mongodb-rest-data-panache extension, in which the annotation processor automatically generated all the required endpoints. With Elasticsearch, we don't benefit yet from the same comfort and, hence, we need to manually implement it. That's not a big deal, as we can inject the previous data repositories, shown below: Java @Path("customers") @Produces(APPLICATION_JSON) @Consumes(APPLICATION_JSON) public class CustomerResourceImpl implements CustomerResource { @Inject CustomerService customerService; @Override public Response createCustomer(Customer customer, @Context UriInfo uriInfo) throws IOException { return Response.accepted(customerService.doIndex(customer)).build(); } @Override public Response findCustomerById(String id) throws IOException { return Response.ok().entity(customerService.getCustomer(id)).build(); } @Override public Response updateCustomer(Customer customer) throws IOException { customerService.modifyCustomer(customer); return Response.noContent().build(); } @Override public Response deleteCustomerById(String id) throws IOException { customerService.removeCustomerById(id); return Response.noContent().build(); } } This is the customer's REST API implementation. The other ones associated with orders, order items, and products are similar.Let's see now how to run and test the whole thing. Running and Testing Our Microservices Now that we looked at the details of our implementation, let's see how to run and test it. We chose to do it on behalf of the docker-compose utility. Here is the associated docker-compose.yml file: YAML version: "3.7" services: elasticsearch: image: elasticsearch:8.12.2 environment: node.name: node1 cluster.name: elasticsearch discovery.type: single-node bootstrap.memory_lock: "true" xpack.security.enabled: "false" path.repo: /usr/share/elasticsearch/backups ES_JAVA_OPTS: -Xms512m -Xmx512m hostname: elasticsearch container_name: elasticsearch ports: - "9200:9200" - "9300:9300" ulimits: memlock: soft: -1 hard: -1 volumes: - node1-data:/usr/share/elasticsearch/data networks: - elasticsearch kibana: image: docker.elastic.co/kibana/kibana:8.6.2 hostname: kibana container_name: kibana environment: - elasticsearch.url=http://elasticsearch:9200 - csp.strict=false ulimits: memlock: soft: -1 hard: -1 ports: - 5601:5601 networks: - elasticsearch depends_on: - elasticsearch links: - elasticsearch:elasticsearch docstore: image: quarkus-nosql-tests/docstore-elasticsearch:1.0-SNAPSHOT depends_on: - elasticsearch - kibana hostname: docstore container_name: docstore links: - elasticsearch:elasticsearch - kibana:kibana ports: - "8080:8080" - "5005:5005" networks: - elasticsearch environment: JAVA_DEBUG: "true" JAVA_APP_DIR: /home/jboss JAVA_APP_JAR: quarkus-run.jar volumes: node1-data: driver: local networks: elasticsearch: This file instructs the docker-compose utility to run three services: A service named elasticsearch running the Elasticsearch 8.6.2 database A service named kibana running the multipurpose web console providing different options such as executing queries, creating aggregations, and developing dashboards and graphs A service named docstore running our Quarkus microservice Now, you may check that all the required processes are running: Shell $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 005ab8ebf6c0 quarkus-nosql-tests/docstore-elasticsearch:1.0-SNAPSHOT "/opt/jboss/containe…" 3 days ago Up 3 days 0.0.0.0:5005->5005/tcp, :::5005->5005/tcp, 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp, 8443/tcp docstore 9678c0a04307 docker.elastic.co/kibana/kibana:8.6.2 "/bin/tini -- /usr/l…" 3 days ago Up 3 days 0.0.0.0:5601->5601/tcp, :::5601->5601/tcp kibana 805eba38ff6c elasticsearch:8.12.2 "/bin/tini -- /usr/l…" 3 days ago Up 3 days 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 0.0.0.0:9300->9300/tcp, :::9300->9300/tcp elasticsearch $ To confirm that the Elasticsearch server is available and able to run queries, you can connect to Kibana at http://localhost:601. After scrolling down the page and selecting Dev Tools in the preferences menu, you can run queries as shown below: In order to test the microservices, proceed as follows: 1. Clone the associated GitHub repository: Shell $ git clone https://github.com/nicolasduminil/docstore.git 2. Go to the project: Shell $ cd docstore 3. Checkout the right branch: Shell $ git checkout elastic-search 4. Build: Shell $ mvn clean install 5. Run the integration tests: Shell $ mvn -DskipTests=false failsafe:integration-test This last command will run the 17 provided integration tests, which should all succeed. You can also use the Swagger UI interface for testing purposes by firing your preferred browser at http://localhost:8080/q:swagger-ui. Then, in order to test endpoints, you can use the payload in the JSON files located in the src/resources/data directory of the docstore-api project.Enjoy!
Parameterized tests allow developers to efficiently test their code with a range of input values. In the realm of JUnit testing, seasoned users have long grappled with the complexities of implementing these tests. But with the release of JUnit 5.7, a new era of test parameterization enters, offering developers first-class support and enhanced capabilities. Let's delve into the exciting possibilities that JUnit 5.7 brings to the table for parameterized testing! Parameterization Samples From JUnit 5.7 Docs Let's see some examples from the docs: Java @ParameterizedTest @ValueSource(strings = { "racecar", "radar", "able was I ere I saw elba" }) void palindromes(String candidate) { assertTrue(StringUtils.isPalindrome(candidate)); } @ParameterizedTest @CsvSource({ "apple, 1", "banana, 2", "'lemon, lime', 0xF1", "strawberry, 700_000" }) void testWithCsvSource(String fruit, int rank) { assertNotNull(fruit); assertNotEquals(0, rank); } @ParameterizedTest @MethodSource("stringIntAndListProvider") void testWithMultiArgMethodSource(String str, int num, List<String> list) { assertEquals(5, str.length()); assertTrue(num >=1 && num <=2); assertEquals(2, list.size()); } static Stream<Arguments> stringIntAndListProvider() { return Stream.of( arguments("apple", 1, Arrays.asList("a", "b")), arguments("lemon", 2, Arrays.asList("x", "y")) ); } The @ParameterizedTest annotation has to be accompanied by one of several provided source annotations describing where to take the parameters from. The source of the parameters is often referred to as the "data provider." I will not dive into their detailed description here: the JUnit user guide does it better than I could, but allow me to share several observations: The @ValueSource is limited to providing a single parameter value only. In other words, the test method cannot have more than one argument, and the types one can use are restricted as well. Passing multiple arguments is somewhat addressed by @CsvSource, parsing each string into a record that is then passed as arguments field-by-field. This can easily get hard to read with long strings and/or plentiful arguments. The types one can use are also restricted — more on this later. All the sources that declare the actual values in annotations are restricted to values that are compile-time constants (limitation of Java annotations, not JUnit). @MethodSource and @ArgumentsSource provides a stream/collection of (un-typed) n-tuples that are then passed as method arguments. Various actual types are supported to represent the sequence of n-tuples, but none of them guarantee that they will fit the method's argument list. This kind of source requires additional methods or classes, but it provides no restriction on where and how to obtain the test data. As you can see, the source types available range from the simple ones (simple to use, but limited in functionality) to the ultimately flexible ones that require more code to get working. Sidenote — This is generally a sign of good design: a little code is needed for essential functionality, and adding extra complexity is justified when used to enable a more demanding use case. What does not seem to fit this hypothetical simple-to-flexible continuum, is @EnumSource. Take a look at this non-trivial example of four parameter sets with 2 values each. Note — While @EnumSource passes the enum's value as a single test method parameter, conceptually, the test is parameterized by enum's fields, that poses no restriction on the number of parameters. Java enum Direction { UP(0, '^'), RIGHT(90, '>'), DOWN(180, 'v'), LEFT(270, '<'); private final int degrees; private final char ch; Direction(int degrees, char ch) { this.degrees = degrees; this.ch = ch; } } @ParameterizedTest @EnumSource void direction(Direction dir) { assertEquals(0, dir.degrees % 90); assertFalse(Character.isWhitespace(dir.ch)); int orientation = player.getOrientation(); player.turn(dir); assertEquals((orientation + dir.degrees) % 360, player.getOrientation()); } Just think of it: the hardcoded list of values restricts its flexibility severely (no external or generated data), while the amount of additional code needed to declare the enum makes this quite a verbose alternative over, say, @CsvSource. But that is just a first impression. We will see how elegant this can get when leveraging the true power of Java enums. Sidenote: This article does not address the verification of enums that are part of your production code. Those, of course, had to be declared no matter how you choose to verify them. Instead, it focuses on when and how to express your test data in the form of enums. When To Use It There are situations when enums perform better than the alternatives: Multiple Parameters per Test When all you need is a single parameter, you likely do not want to complicate things beyond @ValueSource. But as soon as you need multiple -— say, inputs and expected results — you have to resort to @CsvSource, @MethodSource/@ArgumentsSource or @EnumSource. In a way, enum lets you "smuggle in" any number of data fields. So when you need to add more test method parameters in the future, you simply add more fields in your existing enums, leaving the test method signatures untouched. This becomes priceless when you reuse your data provider in multiple tests. For other sources, one has to employ ArgumentsAccessors or ArgumentsAggregators for the flexibility that enums have out of the box. Type Safety For Java developers, this should be a big one. Parameters read from CSV (files or literals), @MethodSource or @ArgumentsSource, they provide no compile-time guarantee that the parameter count, and their types, are going to match the signature. Obviously, JUnit is going to complain at runtime but forget about any code assistance from your IDE. Same as before, this adds up when you reuse the same parameters for multiple tests. Using a type-safe approach would be a huge win when extending the parameter set in the future. Custom Types This is mostly an advantage over text-based sources, such as the ones reading data from CSV — the values encoded in the text need to be converted to Java types. If you have a custom class to instantiate from the CSV record, you can do it using ArgumentsAggregator. However, your data declaration is still not type-safe — any mismatch between the method signature and declared data will pop up in runtime when "aggregating" arguments. Not to mention that declaring the aggregator class adds more support code needed for your parameterization to work. And we ever favored @CsvSource over @EnumSource to avoid the extra code. Documentable Unlike the other methods, the enum source has Java symbols for both parameter sets (enum instances) and all parameters they contain (enum fields). They provide a straightforward place where to attach documentation in its more natural form — the JavaDoc. It is not that documentation cannot be placed elsewhere, but it will be — by definition — placed further from what it documents and thus be harder to find, and easier to become outdated. But There Is More! Now: Enums. Are. Classes. It feels that many junior developers are yet to realize how powerful Java enums truly are. In other programming languages, they really are just glorified constants. But in Java, they are convenient little implementations of a Flyweight design pattern with (much of the) advantages of full-blown classes. Why is that a good thing? Test Fixture-Related Behavior As with any other class, enums can have methods added to them. This becomes handy if enum test parameters are reused between tests — same data, just tested a little differently. To effectively work with the parameters without significant copy and paste, some helper code needs to be shared between those tests as well. It is not something a helper class and a few static methods would not "solve." Sidenote: Notice that such design suffers from a Feature Envy. Test methods — or worse, helper class methods — would have to pull the data out of the enum objects to perform actions on that data. While this is the (only) way in procedural programming, in the object-oriented world, we can do better. Declaring the "helper" methods right in the enum declaration itself, we would move the code where the data is. Or, to put in OOP lingo, the helper methods would become the "behavior" of the test fixtures implemented as enums. This would not only make the code more idiomatic (calling sensible methods on instances over static methods passing data around), but it would also make it easier to reuse enum parameters across test cases. Inheritance Enums can implement interfaces with (default) methods. When used sensibly, this can be leveraged to share behavior between several data providers — several enums. An example that easily comes to mind is separate enums for positive and negative tests. If they represent a similar kind of test fixture, chances are they have some behavior to share. The Talk Is Cheap Let's illustrate this on a test suite of a hypothetical convertor of source code files, not quite unlike the one performing Python 2 to 3 conversion. To have real confidence in what such a comprehensive tool does, one would end up with an extensive set of input files manifesting various aspects of the language, and matching files to compare the conversion result against. Except for that, it is needed to verify what warnings/errors are served to the user for problematic inputs. This is a natural fit for parameterized tests due to the large number of samples to verify, but it does not quite fit any of the simple JUnit parameter sources, as the data are somewhat complex.See below: Java enum Conversion { CLEAN("imports-correct.2.py", "imports-correct.3.py", Set.of()), WARNINGS("problematic.2.py", "problematic.3.py", Set.of( "Using module 'xyz' that is deprecated" )), SYNTAX_ERROR("syntax-error.py", new RuntimeException("Syntax error on line 17")); // Many, many others ... @Nonnull final String inFile; @CheckForNull final String expectedOutput; @CheckForNull final Exception expectedException; @Nonnull final Set<String> expectedWarnings; Conversion(@Nonnull String inFile, @Nonnull String expectedOutput, @NotNull Set<String> expectedWarnings) { this(inFile, expectedOutput, null, expectedWarnings); } Conversion(@Nonnull String inFile, @Nonnull Exception expectedException) { this(inFile, null, expectedException, Set.of()); } Conversion(@Nonnull String inFile, String expectedOutput, Exception expectedException, @Nonnull Set<String> expectedWarnings) { this.inFile = inFile; this.expectedOutput = expectedOutput; this.expectedException = expectedException; this.expectedWarnings = expectedWarnings; } public File getV2File() { ... } public File getV3File() { ... } } @ParameterizedTest @EnumSource void upgrade(Conversion con) { try { File actual = convert(con.getV2File()); if (con.expectedException != null) { fail("No exception thrown when one was expected", con.expectedException); } assertEquals(con.expectedWarnings, getLoggedWarnings()); new FileAssert(actual).isEqualTo(con.getV3File()); } catch (Exception ex) { assertTypeAndMessageEquals(con.expectedException, ex); } } The usage of enums does not restrict us in how complex the data can be. As you can see, we can define several convenient constructors in the enums, so declaring new parameter sets is nice and clean. This prevents the usage of long argument lists that often end up filled with many "empty" values (nulls, empty strings, or collections) that leave one wondering what argument #7 — you know, one of the nulls — actually represents. Notice how enums enable the use of complex types (Set, RuntimeException) with no restrictions or magical conversions. Passing such data is also completely type-safe. Now, I know what you think. This is awfully wordy. Well, up to a point. Realistically, you are going to have a lot more data samples to verify, so the amount of the boilerplate code will be less significant in comparison. Also, see how related tests can be written leveraging the same enums, and their helper methods: Java @ParameterizedTest @EnumSource // Upgrading files already upgraded always passes, makes no changes, issues no warnings. void upgradeFromV3toV3AlwaysPasses(Conversion con) throws Exception { File actual = convert(con.getV3File()); assertEquals(Set.of(), getLoggedWarnings()); new FileAssert(actual).isEqualTo(con.getV3File()); } @ParameterizedTest @EnumSource // Downgrading files created by upgrade procedure is expected to always pass without warnings. void downgrade(Conversion con) throws Exception { File actual = convert(con.getV3File()); assertEquals(Set.of(), getLoggedWarnings()); new FileAssert(actual).isEqualTo(con.getV2File()); } Some More Talk After All Conceptually, @EnumSourceencourages you to create a complex, machine-readable description of individual test scenarios, blurring the line between data providers and test fixtures. One other great thing about having each data set expressed as a Java symbol (enum element) is that they can be used individually; completely out of data providers/parameterized tests. Since they have a reasonable name and they are self-contained (in terms of data and behavior), they contribute to nice and readable tests. Java @Test void warnWhenNoEventsReported() throws Exception { FixtureXmls.Invalid events = FixtureXmls.Invalid.NO_EVENTS_REPORTED; // read() is a helper method that is shared by all FixtureXmls try (InputStream is = events.read()) { EventList el = consume(is); assertEquals(Set.of(...), el.getWarnings()); } } Now, @EnumSource is not going to be one of your most frequently used argument sources, and that is a good thing, as overusing it would do no good. But in the right circumstances, it comes in handy to know how to use all they have to offer.
“The Mixtral-8x7B Large Language Model (LLM) is a pre-trained generative Sparse Mixture of Experts.” When I saw this come out it seemed pretty interesting and accessible, so I gave it a try. With the proper prompting, it seems good. I am not sure if it’s better than Google Gemma, Meta LLAMA2, or OLLAMA Mistral for my use cases. Today I will show you how to utilize the new Mixtral LLM with Apache NiFi. This will require only a few steps to run Mixtral against your text inputs. This model can be run by the lightweight serverless REST API or the transformers library. You can also use this GitHub repository. The context can have up to 32k tokens. You can also enter prompts in English, Italian, German, Spanish, and French. You have a lot of options on how to utilize this model, but I will show you how to build a real-time LLM pipeline utilizing Apache NiFi. One key thing to decide is what kind of input you are going to have (chat, code generation, Q&A, document analysis, summary, etc.). Once you have decided, you will need to do some prompt engineering and will need to tweak your prompt. In the following section, I include a few guides to help you improve your prompt-building skills. I will give you some basic prompt engineering in my walk-through tutorial. Guides To Build Your Prompts Optimally Mixtral: Prompt Engineering Guide Getting Started with Mixtral 8X7B The construction of the prompt is very critical to make this work well, so we are building this with NiFi. Overview of the Flow Step 1: Build and Format Your Prompt In building our application, the following is the basic prompt template that we are going to use. Prompt Template { "inputs": "<s>[INST]Write a detailed complete response that appropriately answers the request.[/INST] [INST]Use this information to enhance your answer: ${context:trim():replaceAll('"',''):replaceAll('\n', '')}[/INST] User: ${inputs:trim():replaceAll('"',''):replaceAll('\n', '')}</s>" } You will enter this prompt in a ReplaceText processor in the Replacement Value field. Step 2: Build Our Call to HuggingFace REST API To Classify Against the Model Add an InvokeHTTP processor to your flow, setting the HTTP URL to the Mixtral API URL. Step 3: Query To Convert and Clean Your Results We use the QueryRecord processor to clean and convert HuggingFace results grabbing the generated_text field. Step 4: Add Metadata Fields We use the UpdateRecord processor to add metadata fields, the JSON readers and writers, and the Literal Value Replacement Value Strategy. The fields we are adding are adding attributes. Overview of Send to Kafka and Slack: Step 5: Add Metadata to Stream We use the UpdateAttribute processor to add the correct "application/json Content Type", and set the model type to Mixtral. Step 6: Publish This Cleaned Record to a Kafka Topic We send it to our local Kafka broker (could be Docker or another) and to our flank-mixtral8x7B topic. If this doesn't exist, NiFi and Kafka will automagically create one for you. Step 7: Retry the Send If something goes wrong, we will try to resend three times, then fail. Overview of Pushing Data to Slack: Step 8: Send the Same Data to Slack for User Reply The first step is to split into a single record to send one at a time. We use the SplitRecord processor for this. As before, reuse the JSON Tree Reader and JSON Record Set Writer. As usual, choose "1" as the Records Per Split. Step 9: Make the Generated Text Available for Messaging We utilize EvaluateJsonPath to extract the Generated Text from Mixtral (on HuggingFace). Step 10: Send the Reply to Slack We use the PublishSlack processor, which is new in Apache NiFi 2.0. This one requires your Channel name or channel ID. We choose the Publish Strategy of Use 'Message Text' Property. For Message Text, use the Slack Response Template below. For the final reply to the user, we will need a Slack Response template formatted for how we wish to communicate. Below is an example that has the basics. Slack Response Template =============================================================================================================== HuggingFace ${modelinformation} Results on ${date}: Question: ${inputs} Answer: ${generated_text} =========================================== Data for nerds ==== HF URL: ${invokehttp.request.url} TXID: ${invokehttp.tx.id} == Slack Message Meta Data == ID: ${messageid} Name: ${messagerealname} [${messageusername}] Time Zone: ${messageusertz} == HF ${modelinformation} Meta Data == Compute Characters/Time/Type: ${x-compute-characters} / ${x-compute-time}/${x-compute-type} Generated/Prompt Tokens/Time per Token: ${x-generated-tokens} / ${x-prompt-tokens} : ${x-time-per-token} Inference Time: ${x-inference-time} // Queue Time: ${x-queue-time} Request ID/SHA: ${x-request-id} / ${x-sha} Validation/Total Time: ${x-validation-time} / ${x-total-time} =============================================================================================================== When this is run, it will look like the image below in Slack. You have now sent a prompt to Hugging Face, had it run against Mixtral, sent the results to Kafka, and responded to the user via Slack. We have now completed a full Mixtral application with zero code. Conclusion You have now built a full round trip utilizing Apache NiFi, HuggingFace, and Slack to build a chatbot utilizing the new Mixtral model. Summary of Learnings Learned how to build a decent prompt for HuggingFace Mixtral Learned how to clean up streaming data Built a HuggingFace REST call that can be reused Processed HuggingFace model call results Send your first Kafka message Formatted and built Slack calls Built a full DataFlow for GenAI If you need additional tutorials on utilizing the new Apache NiFi 2.0, check out: Apache NiFi 2.0.0-M2 Out! For additional information on building Slack bots: Building a Real-Time Slackbot With Generative AI Building an LLM Bot for Meetups and Conference Interactivity Also, thanks for following my tutorial. I am working on additional Apache NiFi 2 and Generative AI tutorials that will be coming to DZone. Finally, if you are in Princeton, Philadelphia, or New York City please come out to my meetups for in-person hands-on work with these technologies. Resources Mixtral of Experts Mixture of Experts Explained mistralai/Mixtral-8x7B-v0.1 Mixtral Overview Invoke the Mixtral 8x7B model on Amazon Bedrock for text generation Running Mixtral 8x7b on M1 16GB Mixtral-8x7B: Understanding and Running the Sparse Mixture of Experts by Mistral AI Retro-Engineering a Database Schema: Mistral Models vs. GPT4, LLama2, and Bard (Episode 3) Comparison of Models: Quality, Performance & Price Analysis A Beginner’s Guide to Fine-Tuning Mixtral Instruct Model
Why Do Organizations Need Secure Development Environments? The need to secure corporate IT environments is common to all functions of organizations, and software application development is one of them. At its core, the need for securing IT environments in organizations arises from the digital corporate assets that they carry. It’s often data attached to privacy concerns, typically under regulations such as GDPR or HIPAA, or application source code, credentials, and most recently operational data that can have strategic significance. Threat scenarios attached to corporate data are not only bound to leaking data to outsiders but also preventing insiders with nefarious intent to exfiltrate data. Hence the security problem is multifaceted: it spans from careless asset handling to willful mishandling. In the case of environments for software application development, the complexity of the security problem lies in addressing the diversity of these environments’ settings. They range from data access needs and environment configuration to the developer’s relationship with the company; e.g., internal employee, consultant, temporary, etc. Security left aside, development environments have notoriously complex setups and often require significant maintenance because many applications and data are locally present on the device’s internal storage; for example, the integrated development environment (IDE) and the application’s source code. Hence, for these environments data protection against leaks will target locally stored assets such as source code, credentials, and potentially sensitive data. Assessing the Risk of Locally Stored Data Let’s first take a quick step back in ICT history and look at an oft-cited 2010 benchmark study named "The Billion Dollar Lost Laptop Problem". The study looks at 329 organizations over 12 months and reports that over 86,000 laptops were stolen or lost, resulting in a loss of 2.1 billion USD, an average of 6.4 million per organization. In 2010, the use of the Cloud as a storage medium for corporate data was nascent; hence today, the metrics to determine the cost and impact of the loss of a corporate laptop would likely look very different. For example, for many of the business functions that were likely to be impacted at that time, Cloud applications have brought today a solution by removing sensitive data from employees’ laptops. This has mostly shifted the discussion on laptop security to protecting the credentials required to access Cloud (or self-hosted) business resources, rather than protecting locally stored data itself. Figure 1: In 2024, most business productivity data has already moved to the cloud. Back in the 2010s, a notable move was CRM data, which ended up greatly reducing the risk of corporate data leaks. There is, though, a notable exception to the above shift in technology: the environments used for code development. For practical reasons, devices used for development today have a replica of projects’ source code, in addition to corporate secrets such as credentials, web tokens, cryptographic keys and perhaps strategic data to train machine learning models or to test algorithms. In other words, there is still plenty of interesting data stored locally in development environments that warrant protection against loss or theft. Therefore, the interest in securing development environments has not waned. There are a variety of reasons for malicious actors to go after assets in these environments, from accessing corporate intellectual property (see the hack of Grand Theft Auto 6), to understanding existing vulnerabilities of an application in order to compromise it in operation. Once compromised, the application might provide access to sensitive data such as personal user information, including credit card numbers. See, for example, the source code hack at Samsung. The final intent here is again to leak potentially sensitive or personal data. Recent and notorious hacks of this kind were password manager company LastPass and the Mercedes hack in early 2024. Despite all these potential downfalls resulting from the hacking of a single developer’s environment, few companies today can accurately determine where the replicas of their source code, secrets, and data are (hint: likely all over the devices of their distributed workforce), and are poorly shielded against the loss of a laptop or a looming insider threat. Recall that, using any online or self-hosted source code repositories such as GitHub does not get rid of any of the replicas in developers’ environments. This is because local replicas are needed for developers to update the code before sending it back to the online Git repository. Hence protecting these environments is a problem that grows with the number of developers working in the organization. Use Cases for Virtual Desktops and Secure Developer Laptops The desire to remove data from developers’ environments is prevalent across many regulated industries such as Finance and Insurance. One of the most common approaches is the use of development machines accessed remotely. Citrix and VMware have been key actors in this market by enabling developers to remotely access virtual machines hosted by the organization. In addition, these platforms implement data loss prevention mechanisms that monitor user activities to prevent data exfiltration. Figure 2: Left - Developers to remotely access virtual machines hosted by the organization. Right - Virtualization has evolved from emulating machines to processes, which is used as a staple for DevOps. Running and accessing a virtual machine remotely for development has many drawbacks in particular on the developer’s productivity. One reason is that the streaming mechanism used to access the remote desktop requires significant bandwidth to be truly usable and often results in irritating lags when typing code. The entire apparatus is also complex to set up, as well as costly to maintain and operate for the organization. In particular, the use of a virtual machine is quite a heavy mechanism that requires significant computational resources (hence cost) to run. Finally, such a setup is general-purpose; i.e., it is not designed in particular for code development and requires the installation of the entire development tool suite. For the reasons explained above, many organizations have reverted to securing developer laptops using end-point security mechanisms implementing data loss prevention measures. In the same way, as for the VDI counterpart, this is also often a costly solution because such laptops have complex setups. When onboarding remote development teams, organizations often send these laptops through the mail at great expense, which complicates the maintenance and monitoring process. The Case for Secure Cloud Development Environments Recently, virtualization has evolved from emulating entire machines to the granularity of single processes with the technology of software containers. Containers are well-suited for code development because they provide a minimal and sufficient environment to compile typical applications, in particular web-based ones. Notably, in comparison to virtual machines, containers start in seconds instead of minutes and require much fewer computational resources to execute. Containers are typically tools used locally by developers on their devices to isolate software dependencies related to a specific project in a way that the source code can be compiled and executed without interference with potentially unwanted settings. The great thing about containers is that they don’t have to remain a locally used development tool. They can be run online and used as an alternative to a virtual machine. This is the basic mechanism used to implement a Cloud Development Environment (CDE). Figure 3: Containers can be run online and become a lightweight alternative to a virtual machine. This is the basic mechanism to implement a Cloud Development Environment. Running containers online has been one of the most exciting recent trends in virtualization aligned with DevOps practices where containers are critical to enable efficient testing and deployments. CDEs are accessed online with an IDE via network connection (Microsoft Visual Studio Code has such a feature as explained here) or using a Cloud IDE (an IDE running in a web browser such as Microsoft Visual Studio Code, Eclipse Theia, and others.) A Cloud IDE allows a developer to access a CDE with the benefit that no environment needs to be installed on the local device. Access to the remote container is done transparently. Compared to a remotely executing desktop as explained before, discomfort due to a streaming environment does not apply here since the IDE is executing as a web application in the browser. Hence the developer will not suffer display lags in particular in low bandwidth environments as is the case with VDI and DaaS. Bandwidth requirements between the IDE and the CDE are low because only text information is exchanged between the two. Figure 4: Access to the remote container is done with an IDE running in a web browser; hence, developers will not suffer display lags, particularly in low bandwidth environments As a result, in the specific context of application development, the use of CDEs is a lightweight mechanism to remove development data from local devices. However, this still does not achieve the security delivered by Citrix and other VDI platforms, because CDEs are designed for efficiency and not for security. They do not provide any data loss prevention mechanism. This is where the case to implement secure Cloud Development Environments lies: CDEs with data loss prevention provide a lightweight alternative to the use of VDI or secure development laptops, with the additional benefit of an improved developer experience. The resulting platform is a secure Cloud Development platform. Using such a platform, organizations can significantly start to reduce the cost of provisioning secure development environments for their developers. Figure 5: To become a replacement for VDIs or secure laptops, Cloud Development Environments need to include security measures against data leaks. Moving From Virtual Desktops To Secure Cloud Development Environments As a conclusion to this discussion, below I briefly retrace the different steps to build the case for a secure Cloud-based development platform that combines the efficient infrastructure of CDE with end-to-end data protection against data exfiltration, leading to a secure CDE. Initially, secure developer laptops were used to directly access corporate resources sometimes using a VPN when outside the IT perimeter. According to the benchmark study that I mentioned at the beginning of this article, 41% of laptops routinely contained sensitive data according to the study that I mentioned at the beginning of this article. Then, the use of virtual machines and early access to web applications has allowed organizations to remove data from local laptop storage. But code development on remote virtual machines was and remains strenuous. Recently, the use of lightweight virtualization based on containers has allowed quicker access to online development environments, but all current vendors in this space do not have data security since the primary use case is productivity. Figure 6: A representation of the technological evolution of mechanisms used by organizations to provision secure development environments, across the last decade. Finally, a secure Cloud Development Environment platform (as shown in the rightmost figure below) illustrates the closest incarnation of the secure development laptop. Secure CDEs benefit from the experiences of pioneering companies like Citrix, seizing the chance to separate development environments from traditional hardware. This separation allows for a blend of infrastructure efficiency and security without compromising developers' experience.
Continuous Improvement as a Team
March 19, 2024 by CORE
Women in Tech: Pioneering Leadership in DevOps and Platform Engineering
March 14, 2024 by
Using My New Raspberry Pi To Run an Existing GitHub Action
March 18, 2024 by CORE
Explainable AI: Making the Black Box Transparent
May 16, 2023 by CORE
Using My New Raspberry Pi To Run an Existing GitHub Action
March 18, 2024 by CORE
Cilium: The De Facto Kubernetes Networking Layer and Its Exciting Future
March 18, 2024 by CORE
Virtual Network Functions in VPC and Integration With Event Notifications in IBM Cloud
March 19, 2024 by
Using My New Raspberry Pi To Run an Existing GitHub Action
March 18, 2024 by CORE
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Rethinking DevOps in 2024: Adapting to a New Era of Technology
March 18, 2024 by
March 18, 2024 by CORE
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by