Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(cache): default to SimpleCache in debug mode #18976

Merged
merged 15 commits into from
Mar 2, 2022
1 change: 1 addition & 0 deletions UPDATING.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ assists people when migrating to a new version.

### Breaking Changes

- [18976](https://github.com/apache/superset/pull/18976): A new `DEFAULT_CACHE_CONFIG` parameter has been introduced in `config.py` which makes it possible to define a default cache config that will be used as the basis for all cache configs. When running the app in debug mode, the app will default to use `SimpleCache`; in other cases the default cache type will be `NullCache`. In addition, `DEFAULT_CACHE_TIMEOUT` has been deprecated and moved into `DEFAULT_CACHE_CONFIG` (will be removed in Superset 2.0). For installations using Redis or other caching backends, it is recommended to set the default cache options in `DEFAULT_CACHE_CONFIG` to ensure the primary cache is always used if new caches are added.
- [17881](https://github.com/apache/superset/pull/17881): Previously simple adhoc filter values on string columns were stripped of enclosing single and double quotes. To fully support literal quotes in filters, both single and double quotes will no longer be removed from filter values.
- [17984](https://github.com/apache/superset/pull/17984): Default Flask SECRET_KEY has changed for security reasons. You should always override with your own secret. Set `PREVIOUS_SECRET_KEY` (ex: PREVIOUS_SECRET_KEY = "\2\1thisismyscretkey\1\2\\e\\y\\y\\h") with your previous key and use `superset re-encrypt-secrets` to rotate you current secrets
- [15254](https://github.com/apache/superset/pull/15254): Previously `QUERY_COST_FORMATTERS_BY_ENGINE`, `SQL_VALIDATORS_BY_ENGINE` and `SCHEDULED_QUERIES` were expected to be defined in the feature flag dictionary in the `config.py` file. These should now be defined as a top-level config, with the feature flag dictionary being reserved for boolean only values.
Expand Down
37 changes: 18 additions & 19 deletions docs/docs/installation/cache.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,37 +7,36 @@ version: 1

## Caching

Superset uses [Flask-Caching](https://flask-caching.readthedocs.io/) for caching purpose. For security reasons,
there are two separate cache configs for Superset's own metadata (`CACHE_CONFIG`) and charting data queried from
connected datasources (`DATA_CACHE_CONFIG`). However, Query results from SQL Lab are stored in another backend
called `RESULTS_BACKEND`, See [Async Queries via Celery](/docs/installation/async-queries-celery) for details.

Configuring caching is as easy as providing `CACHE_CONFIG` and `DATA_CACHE_CONFIG` in your
Superset uses [Flask-Caching](https://flask-caching.readthedocs.io/) for caching purpose. Default caching options
can be set by overriding the `DEFAULT_CACHE_CONFIG` in your `superset_config.py`. Unless overridden, the default
cache type will be set to `SimpleCache` when running in debug mode, and `NullCache` otherwise.

Currently there are five separate cache configurations to provide additional security and more granular customization options:
- Metadata cache (optional): `CACHE_CONFIG`
- Charting data queried from datasets (optional): `DATA_CACHE_CONFIG`
- SQL Lab query results (optional): `RESULTS_BACKEND`. See [Async Queries via Celery](/docs/installation/async-queries-celery) for details
- Dashboard filter state (required): `FILTER_STATE_CACHE_CONFIG`.
- Explore chart form data (required): `EXPLORE_FORM_DATA_CACHE_CONFIG`

Configuring caching is as easy as providing a custom cache config in your
`superset_config.py` that complies with [the Flask-Caching specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching).

Flask-Caching supports various caching backends, including Redis, Memcached, SimpleCache (in-memory), or the
local filesystem.
local filesystem. Custom cache backends are also supported. See [here](https://flask-caching.readthedocs.io/en/latest/#custom-cache-backends) for specifics.

Note that Dashboard and Explore caching is required, and configuring the application with either of these caches set to `NullCache` will
cause the application to fail on startup. Also keep in mind, tht when running Superset on a multi-worker setup, a dedicated cache is required.
villebro marked this conversation as resolved.
Show resolved Hide resolved
For this we recommend running either Redis or Memcached:

- Redis (recommended): we recommend the [redis](https://pypi.python.org/pypi/redis) Python package
- Memcached: we recommend using [pylibmc](https://pypi.org/project/pylibmc/) client library as
`python-memcached` does not handle storing binary data correctly.
- Redis: we recommend the [redis](https://pypi.python.org/pypi/redis) Python package

Both of these libraries can be installed using pip.

For chart data, Superset goes up a “timeout search path”, from a slice's configuration
to the datasource’s, the database’s, then ultimately falls back to the global default
defined in `DATA_CACHE_CONFIG`.

```
DATA_CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
'CACHE_KEY_PREFIX': 'superset_results',
'CACHE_REDIS_URL': 'redis://localhost:6379/0',
}
```

Custom cache backends are also supported. See [here](https://flask-caching.readthedocs.io/en/latest/#custom-cache-backends) for specifics.

Superset has a Celery task that will periodically warm up the cache based on different strategies.
To use it, add the following to the `CELERYBEAT_SCHEDULE` section in `config.py`:
Expand Down
2 changes: 1 addition & 1 deletion superset/common/query_context_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,7 @@ def get_cache_timeout(self) -> int:
cache_timeout_rv = self._query_context.get_cache_timeout()
if cache_timeout_rv:
return cache_timeout_rv
return config["CACHE_DEFAULT_TIMEOUT"]
return app.config["DEFAULT_CACHE_CONFIG"]["CACHE_DEFAULT_TIMEOUT"]

def cache_key(self, **extra: Any) -> str:
"""
Expand Down
33 changes: 16 additions & 17 deletions superset/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
from cachelib.base import BaseCache
from celery.schedules import crontab
from dateutil import tz
from flask import Blueprint
from flask import Blueprint, Flask
from flask_appbuilder.security.manager import AUTH_DB
from pandas._libs.parsers import STR_NA_VALUES # pylint: disable=no-name-in-module
from typing_extensions import Literal
Expand Down Expand Up @@ -543,8 +543,8 @@ def _try_json_readsha(filepath: str, length: int) -> Optional[str]:
# Also used by Alerts & Reports
# ---------------------------------------------------
THUMBNAIL_SELENIUM_USER = "admin"
# thumbnail cache (will be merged with DEFAULT_CACHE_CONFIG)
THUMBNAIL_CACHE_CONFIG: CacheConfig = {
"CACHE_TYPE": "null",
"CACHE_NO_NULL_WARNING": True,
}

Expand Down Expand Up @@ -576,31 +576,30 @@ def _try_json_readsha(filepath: str, length: int) -> Optional[str]:
# Setup image size default is (300, 200, True)
# IMG_SIZE = (300, 200, True)

# Default cache timeout, applies to all cache backends unless specifically overridden in
# each cache config.
CACHE_DEFAULT_TIMEOUT = int(timedelta(days=1).total_seconds())
# Default cache for Superset objects (will be used as the base for all cache configs)
DEFAULT_CACHE_CONFIG: CacheConfig = {
"CACHE_TYPE": "NullCache",
"CACHE_DEFAULT_TIMEOUT": int(timedelta(days=1).total_seconds()),
}

# Default cache for Superset objects
CACHE_CONFIG: CacheConfig = {"CACHE_TYPE": "null"}
# Default cache for Superset objects (will be merged with DEFAULT_CACHE_CONFIG)
CACHE_CONFIG: CacheConfig = {}

# Cache for datasource metadata and query results
DATA_CACHE_CONFIG: CacheConfig = {"CACHE_TYPE": "null"}
# Cache for datasource metadata and query results (will be merged with
# DEFAULT_CACHE_CONFIG)
DATA_CACHE_CONFIG: CacheConfig = {}

# Cache for filters state
# Cache for filters state (will be merged with DEFAULT_CACHE_CONFIG)
FILTER_STATE_CACHE_CONFIG: CacheConfig = {
"CACHE_TYPE": "FileSystemCache",
"CACHE_DIR": os.path.join(DATA_DIR, "cache"),
"CACHE_DEFAULT_TIMEOUT": int(timedelta(days=90).total_seconds()),
"CACHE_THRESHOLD": 0,
# should the timeout be reset when retrieving a cached value
"REFRESH_TIMEOUT_ON_RETRIEVAL": True,
}

# Cache for chart form data
# Cache for chart form data (will be merged with DEFAULT_CACHE_CONFIG)
EXPLORE_FORM_DATA_CACHE_CONFIG: CacheConfig = {
"CACHE_TYPE": "FileSystemCache",
"CACHE_DIR": os.path.join(DATA_DIR, "cache"),
"CACHE_DEFAULT_TIMEOUT": int(timedelta(days=7).total_seconds()),
"CACHE_THRESHOLD": 0,
# should the timeout be reset when retrieving a cached value
"REFRESH_TIMEOUT_ON_RETRIEVAL": True,
}

Expand Down
4 changes: 3 additions & 1 deletion superset/sql_lab.py
Original file line number Diff line number Diff line change
Expand Up @@ -538,7 +538,9 @@ def execute_sql_statements( # pylint: disable=too-many-arguments, too-many-loca
)
cache_timeout = database.cache_timeout
if cache_timeout is None:
cache_timeout = config["CACHE_DEFAULT_TIMEOUT"]
cache_timeout = app.config["DEFAULT_CACHE_CONFIG"][
"CACHE_DEFAULT_TIMEOUT"
]

compressed = zlib_compress(serialized_payload)
logger.debug(
Expand Down
16 changes: 2 additions & 14 deletions superset/typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,8 @@
# specific language governing permissions and limitations
# under the License.
from datetime import datetime
from typing import (
Any,
Callable,
Dict,
List,
Optional,
Sequence,
Tuple,
TYPE_CHECKING,
Union,
)
from typing import Any, Dict, List, Optional, Sequence, Tuple, TYPE_CHECKING, Union

from flask import Flask
from flask_caching import Cache
from typing_extensions import Literal, TypedDict
from werkzeug.wrappers import Response

Expand Down Expand Up @@ -69,7 +57,7 @@ class AdhocColumn(TypedDict, total=False):
sqlExpression: Optional[str]


CacheConfig = Union[Callable[[Flask], Cache], Dict[str, Any]]
CacheConfig = Dict[str, Any]
villebro marked this conversation as resolved.
Show resolved Hide resolved
DbapiDescriptionRow = Tuple[
str, str, Optional[str], Optional[str], Optional[int], Optional[int], bool
]
Expand Down
11 changes: 8 additions & 3 deletions superset/utils/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,15 @@
from functools import wraps
from typing import Any, Callable, Dict, Optional, TYPE_CHECKING, Union

from flask import current_app as app, request
from flask import current_app as app, Flask, request
from flask_caching import Cache
from flask_caching.backends import NullCache
from werkzeug.wrappers.etag import ETagResponseMixin

from superset import db
from superset.extensions import cache_manager
from superset.models.cache import CacheKey
from superset.typing import CacheConfig
from superset.utils.core import json_int_dttm_ser
from superset.utils.hashing import md5_sha_from_dict

Expand All @@ -55,7 +56,11 @@ def set_and_log_cache(
if isinstance(cache_instance.cache, NullCache):
return

timeout = cache_timeout if cache_timeout else config["CACHE_DEFAULT_TIMEOUT"]
timeout = (
cache_timeout
if cache_timeout is not None
else app.config["DEFAULT_CACHE_CONFIG"]["CACHE_DEFAULT_TIMEOUT"]
)
try:
dttm = datetime.utcnow().isoformat().split(".")[0]
value = {**cache_value, "dttm": dttm}
Expand Down Expand Up @@ -146,7 +151,7 @@ def etag_cache(

"""
if max_age is None:
max_age = app.config["CACHE_DEFAULT_TIMEOUT"]
max_age = app.config["DEFAULT_CACHE_CONFIG"]["CACHE_DEFAULT_TIMEOUT"]

def decorator(f: Callable[..., Any]) -> Callable[..., Any]:
@wraps(f)
Expand Down
119 changes: 62 additions & 57 deletions superset/utils/cache_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,73 +14,78 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
import logging
import math

from flask import Flask
from flask_babel import gettext as _
from flask_caching import Cache

from superset.typing import CacheConfig

logger = logging.getLogger(__name__)


class CacheManager:
def __init__(self) -> None:
super().__init__()

self._cache = Cache()
self._data_cache = Cache()
self._thumbnail_cache = Cache()
self._filter_state_cache = Cache()
self._explore_form_data_cache = Cache()
self._default_cache_config: CacheConfig = {}
self.cache = Cache()
self.data_cache = Cache()
self.thumbnail_cache = Cache()
self.filter_state_cache = Cache()
self.explore_form_data_cache = Cache()

def _init_cache(
self, app: Flask, cache: Cache, cache_config_key: str, required: bool = False
) -> None:
config = {**self._default_cache_config, **app.config[cache_config_key]}
if required and config["CACHE_TYPE"] in ("null", "NullCache"):
raise Exception(
_(
"The CACHE_TYPE `%(cache_type)s` for `%(cache_config_key)s` is not "
"supported. It is recommended to use `RedisCache`, `MemcachedCache` "
"or another dedicated caching backend for production deployments",
cache_type=config["CACHE_TYPE"],
cache_config_key=cache_config_key,
),
)
cache.init_app(app, config)

def init_app(self, app: Flask) -> None:
self._cache.init_app(
app,
{
"CACHE_DEFAULT_TIMEOUT": app.config["CACHE_DEFAULT_TIMEOUT"],
**app.config["CACHE_CONFIG"],
},
)
self._data_cache.init_app(
app,
{
"CACHE_DEFAULT_TIMEOUT": app.config["CACHE_DEFAULT_TIMEOUT"],
**app.config["DATA_CACHE_CONFIG"],
},
)
self._thumbnail_cache.init_app(
app,
{
"CACHE_DEFAULT_TIMEOUT": app.config["CACHE_DEFAULT_TIMEOUT"],
**app.config["THUMBNAIL_CACHE_CONFIG"],
},
)
self._filter_state_cache.init_app(
app,
{
"CACHE_DEFAULT_TIMEOUT": app.config["CACHE_DEFAULT_TIMEOUT"],
**app.config["FILTER_STATE_CACHE_CONFIG"],
},
if app.debug:
self._default_cache_config = {
"CACHE_TYPE": "SimpleCache",
"CACHE_THRESHOLD": math.inf,
}
else:
self._default_cache_config = {}

default_timeout = app.config.get("CACHE_DEFAULT_TIMEOUT")
if default_timeout is not None:
self._default_cache_config["CACHE_DEFAULT_TIMEOUT"] = default_timeout
logger.warning(
_(
"The global config flag `CACHE_DEFAULT_TIMEOUT` has been "
"deprecated and will be removed in Superset 2.0. Please set "
"default cache options in the `DEFAULT_CACHE_CONFIG` parameter"
),
)
self._default_cache_config = {
**self._default_cache_config,
**app.config["DEFAULT_CACHE_CONFIG"],
}

self._init_cache(app, self.cache, "CACHE_CONFIG")
self._init_cache(app, self.data_cache, "DATA_CACHE_CONFIG")
self._init_cache(app, self.thumbnail_cache, "THUMBNAIL_CACHE_CONFIG")
self._init_cache(
app, self.filter_state_cache, "FILTER_STATE_CACHE_CONFIG", required=True
)
self._explore_form_data_cache.init_app(
self._init_cache(
app,
{
"CACHE_DEFAULT_TIMEOUT": app.config["CACHE_DEFAULT_TIMEOUT"],
**app.config["EXPLORE_FORM_DATA_CACHE_CONFIG"],
},
self.explore_form_data_cache,
"EXPLORE_FORM_DATA_CACHE_CONFIG",
required=True,
)

@property
def data_cache(self) -> Cache:
return self._data_cache

@property
def cache(self) -> Cache:
return self._cache

@property
def thumbnail_cache(self) -> Cache:
return self._thumbnail_cache

@property
def filter_state_cache(self) -> Cache:
return self._filter_state_cache

@property
def explore_form_data_cache(self) -> Cache:
return self._explore_form_data_cache
2 changes: 1 addition & 1 deletion superset/viz.py
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,7 @@ def cache_timeout(self) -> int:
return self.datasource.database.cache_timeout
if config["DATA_CACHE_CONFIG"].get("CACHE_DEFAULT_TIMEOUT") is not None:
return config["DATA_CACHE_CONFIG"]["CACHE_DEFAULT_TIMEOUT"]
return config["CACHE_DEFAULT_TIMEOUT"]
return app.config["DEFAULT_CACHE_CONFIG"]["CACHE_DEFAULT_TIMEOUT"]

def get_json(self) -> str:
return json.dumps(
Expand Down
14 changes: 9 additions & 5 deletions tests/integration_tests/cache_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def tearDown(self):
@pytest.mark.usefixtures("load_birth_names_dashboard_with_slices")
def test_no_data_cache(self):
data_cache_config = app.config["DATA_CACHE_CONFIG"]
app.config["DATA_CACHE_CONFIG"] = {"CACHE_TYPE": "null"}
app.config["DATA_CACHE_CONFIG"] = {"CACHE_TYPE": "NullCache"}
cache_manager.init_app(app)

slc = self.get_slice("Girls", db.session)
Expand All @@ -64,11 +64,15 @@ def test_no_data_cache(self):
@pytest.mark.usefixtures("load_birth_names_dashboard_with_slices")
def test_slice_data_cache(self):
# Override cache config
default_cache_config = app.config["DEFAULT_CACHE_CONFIG"]

app.config["DEFAULT_CACHE_CONFIG"] = {
**default_cache_config,
"CACHE_DEFAULT_TIMEOUT": 100,
}
data_cache_config = app.config["DATA_CACHE_CONFIG"]
cache_default_timeout = app.config["CACHE_DEFAULT_TIMEOUT"]
app.config["CACHE_DEFAULT_TIMEOUT"] = 100

app.config["DATA_CACHE_CONFIG"] = {
"CACHE_TYPE": "simple",
"CACHE_DEFAULT_TIMEOUT": 10,
"CACHE_KEY_PREFIX": "superset_data_cache",
villebro marked this conversation as resolved.
Show resolved Hide resolved
}
Expand Down Expand Up @@ -101,5 +105,5 @@ def test_slice_data_cache(self):

# reset cache config
app.config["DATA_CACHE_CONFIG"] = data_cache_config
app.config["CACHE_DEFAULT_TIMEOUT"] = cache_default_timeout
app.config["DEFAULT_CACHE_CONFIG"] = default_cache_config
cache_manager.init_app(app)
Loading