DataScale collects and processes data in real-time from numerous data sources and imports it into the Yanhuang Data Platform.
Product Highlights
Quickly Connect to Various Data Sources
Provides an intuitive Web UI to manage data pipelines, quickly connect to various data sources, and configure data processing logic.
Integrated Open-Source Data Collectors
Integrates multiple open source data collectors, supports a rich variety of out-of-the-box data source types and comprehensive data processing methods.
Easy Debugging of Data Collection Logic
Provides convenient tools for debugging data collection configurations and data processing logic.
Custom Data Collector Support
Provides support for users to develop custom data collectors to meet non-generic data collection needs.
Centralized Management of Large-Scale Deployments
Provides easy-to-use bulk deployment and cluster management features.
Observability
Offers enhanced observability for data pipelines.
Quickly Connect to Various Data Sources
Provides an intuitive Web UI to manage data pipelines, quickly connect to various data sources, and configure data processing logic.
Integrated Open-Source Data Collectors
Integrates multiple open source data collectors, supports a rich variety of out-of-the-box data source types and comprehensive data processing methods.
Easy Debugging of Data Collection Logic
Provides convenient tools for debugging data collection configurations and data processing logic.
Custom Data Collector Support
Provides support for users to develop custom data collectors to meet non-generic data collection needs.
Centralized Management of Large-Scale Deployments
Provides easy-to-use bulk deployment and cluster management features.
Observability
Offers enhanced observability for data pipelines.
2.18.0
2.17.0
2.16.0
2.15.0
New Features:
A worker-only and more lightweight DataScale installation package is available;
The DataScale exec source supports the crontab method of executing tasks;
DataScale file source updated:
The encoding configuration for reading the file is provided on the page;
The fingerprint configuration for files is provided on the page, and two new fingerprint strategies for full_content_checksum and modification_time have been added;
Added trigger_wait_sec configuration to support the control of holding off on reading file update content.
Bug Fixes:
DataScale throughput storage and query optimization to avoid traffic query timeout and overcapacity problems in large-scale clustered environments;
Fixes the problem where a group is specified in the configuration file when a worker is created, but the page shows that the group is not assigned to any worker;
Fixes the problem where the number of running workers displayed on the page may be different each time dataflow is refreshed with the heartbeat jitter configurations set.
For installation, upgrade and usage instructions, please refer to:
Click the buttons below to download DataScale 2.18.0: